Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitLab CI Update, main branch (2024.06.18.) #616

Merged
merged 2 commits into from
Jun 19, 2024

Conversation

krasznaa
Copy link
Member

Updated the GitLab CI builds to the latest Docker images, used by the GitHub CI as well. (ghcr.io/acts-project/ubuntu2004_cuda:47 and ghcr.io/acts-project/ubuntu2004_cuda_oneapi:47)

At the same time introduced SYCL builds/tests with an NVIDIA backend as well.

I'm quite curious how successful this will be... 🤔

@krasznaa krasznaa added the cicd Changes related to the CI system label Jun 18, 2024
@paulgessinger
Copy link
Member

Changes look good to me, let's see what the CI says.

btw it doesn't hurt to throw an nvidia-smi in the test jobs, this helps debugging in case something goes wrong with the container runtime

@krasznaa
Copy link
Member Author

Indeed. I forgot about nvidia-smi by mistake. 🤔

Plus now I also have some SYCL runtime errors to hunt down. 😛

@krasznaa krasznaa force-pushed the GitLabCIUpdate-main-20240618 branch from c9a4dbe to ac1d412 Compare June 18, 2024 14:51
@krasznaa
Copy link
Member Author

As it turns out, the SYCL tests were failing because after building binaries only for an NVIDIA backend, the test was trying to execute the tests on the Xeon CPU of the test machine. 😛

But emboldened by the possibility of testing the "Intel backend" actively as well, I added some corresponding configuration to the GitLab CI configuration. But on this, I'm interested in your opinion. Since the CPU backend test is not really different from what we do on GitHub as well. 🤔 So I can easily be convinced of turning that off. (Once we make some Intel GPUs available for testing, this would change of course.)

The SYCL clusterization failure is "real", but @stephenswat promised to fix that in a separate PR. As it should be the same sort of issue as the one he fixed in #614.

@krasznaa
Copy link
Member Author

One could also revive the discussion about strictly only running the CUDA and SYCL tests in the GitLab CI. 🤔 (Currently all CPU tests are also being run.)

Unfortunately that would need some slightly different kind of code in the YAML file. (To ask for different test executables from the CUDA and SYCL tests...)

@paulgessinger
Copy link
Member

@krasznaa How long do you expect the extra tests to last? If there's no reason to run them on GitLab (i.e. cvmfs), I'd lean towards sticking to GitHub resources for these for now.

@krasznaa krasznaa force-pushed the GitLabCIUpdate-main-20240618 branch from ac1d412 to 95cbaaf Compare June 19, 2024 07:44
@krasznaa
Copy link
Member Author

As it turned out, we don't actually run any SYCL tests on CPUs in GitHub at the moment. (Otherwise we would've noticed #618 even sooner.) Also, just because our own machine is a well understood hardware, I think it might be a bit safer to run those CPU tests on our own machine. 🤔

The PR now tries to make sure that it would only run the actual CUDA and SYCL tests using GitLab. Skipping the (non-SYCL) CPU tests, since those are run on GitHub already.

Finally, I finally found a good way for downloading the CodePlay oneAPI plugins in a script. Since for actually running the tests with the CUDA backend, we need the appropriate plugin to be installed. Which I left out of acts-project/machines#102, as I just didn't know how to do this correctly.

Once a new tag of the Docker images include the necessary plugins, the script introduced by this PR can be removed. 🤔

@krasznaa krasznaa force-pushed the GitLabCIUpdate-main-20240618 branch from 95cbaaf to 9fc3630 Compare June 19, 2024 09:18
@stephenswat
Copy link
Member

Since time is a valuable resource on this machine, let's disable the build of the examples library for these tests. Also, we should consider making a dedicated docker image for this with the dependencies as well as the data directory pre-installed.

At the same time introduced SYCL builds/tests with an Intel and
NVIDIA backend as well. (The Intel backend running on the CPU
on the test machine for the moment.) This required installing the
NVIDIA oneAPI plugin during the test, as that was left out of
the existing Acts Docker image. :-(
I.e. that they would include "CUDA" and "SYCL" in their test names
respectively. To make it easier to filter them out using CTest.
@krasznaa krasznaa force-pushed the GitLabCIUpdate-main-20240618 branch from 9fc3630 to beec48e Compare June 19, 2024 11:10
@stephenswat stephenswat merged commit d1725ac into acts-project:main Jun 19, 2024
25 checks passed
@krasznaa krasznaa deleted the GitLabCIUpdate-main-20240618 branch June 19, 2024 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cicd Changes related to the CI system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants