Adding CI scripts #45

G-Ragghianti · 2024-03-18T18:32:31Z

No description provided.

mkstoyanov · 2024-03-19T14:32:32Z

Thanks for setting this up, let me know if you run into any issues.

G-Ragghianti · 2024-03-19T14:38:21Z

One question so far: the intel-mkl package in spack has been deprecated in favor of the oneapi version (intel-oneapi-mkl). We were previously using intel-mkl for the MKL heffte backend. Is it sufficient to use just the ONEAPI backend and deprecate the MKL backend, or do they serve different purposes?

mkstoyanov · 2024-03-19T14:43:30Z

One question so far: the intel-mkl package in spack has been deprecated in favor of the oneapi version (intel-oneapi-mkl). We were previously using intel-mkl for the MKL heffte backend. Is it sufficient to use just the ONEAPI backend and deprecate the MKL backend, or do they serve different purposes?

The MKL backend is CPU only and doesn't require the Intel SYCL compiler. This is needed for many platforms regardless of the spack deprecation.

However, we don't need a separate test for all possible backend combinations (simply not practical). It is Ok to just test MKL + OneAPI in spack, especially if the stand-alone MKL package has been deperecated. We can even deprecate the standalone +mkl variant in spack (not not in C++).

G-Ragghianti · 2024-03-19T14:46:09Z

Maybe the +mkl variant in the spack package should be changed to use intel-oneapi-mkl instead of intel-mkl?

mkstoyanov · 2024-03-19T14:47:47Z

Maybe the +mkl variant in the spack package should be changed to use intel-oneapi-mkl instead of intel-mkl?

Another possibility, haven't tested it but it should work.

G-Ragghianti · 2024-03-19T20:53:26Z

I have most of the checks passing now. Do you think there is a good chance that we can get the two failing checks to pass, or would you be OK with implementing the CI the way it is now?

mkstoyanov · 2024-03-19T21:24:53Z

Generally speaking, so long as most of the tests are working, then the CI is good.

The Intel test seems a problem with GPU, it is complaining about missing fp64 capabilities. Is it detecting the correct device?

The ROCm is also puzzling, it is failing to launch a basic scaling kernel, which is just that, scales a vector by a fixed number.

Both of those could be some issue with the environment selecting the correct device and/or permissions. But this is just a guess, I'm not familiar with the environment.

G-Ragghianti · 2024-03-20T16:04:17Z

For the intel gpu, I think it is actually missing the fp64 capability. The GPU is an Arc770 (consumer card). We had to disable the double precision tests in slate for this. Is that a possibility here?

mkstoyanov · 2024-03-20T16:33:23Z

For the intel gpu, I think it is actually missing the fp64 capability. The GPU is an Arc770 (consumer card). We had to disable the double precision tests in slate for this. Is that a possibility here?

Unfortunately, heFFTe doesn't have that option at the moment. I could add it in time, but it is a significant intrusion into the code. It's probably best to either disable the test for now, or don't use the GPU, just run SYCL on the CPU (fallback mode). That still tests the GPU kernels and it will work with all precision.

G-Ragghianti · 2024-04-01T20:36:18Z

Just FYI: I'm keeping this PR as a draft until we finalize the slurm queue policies at our site. The new CI system runs the jobs through our queues, and I need to make sure the CI jobs don't interfere with normal users.

mkstoyanov · 2024-04-01T20:38:24Z

Just FYI: I'm keeping this PR as a draft until we finalize the slurm queue policies at our site. The new CI system runs the jobs through our queues, and I need to make sure the CI jobs don't interfere with normal users.

No worries, I won't merge anything until you say it's ready (at which point you can merge it yourself).

ax3l · 2024-08-23T17:00:46Z

Hi, this is a great PR to add CI - do you like to update and merge it maybe partially already? :)

(Our interest: We want to start depending on heFFTe in WarpX & ImpactX mainline, and having CI in a dependency is one of our quality criteria :))

mkstoyanov · 2024-08-23T17:17:25Z

The PR is @G-Ragghianti's work as it is connected to the machines at IC. Given the lack of post-ECP funding, I don't know what the status is here or if this is even feasible at the moment.

I am doing now is I have a series of automated scripts that run docker, same as a CI would except it is not automatically triggered from github. That's always been a challenge due to security and authentication.

The free github runners are a bit deficient, no GPU support of any kind and heFFTe needs at least 12 MPI ranks to have a non-symmetric data layout of boxes. I can try to setup one or two free github tests, would that be sufficient? Or do I need to find some alternatives.

G-Ragghianti · 2024-08-23T17:53:12Z

We intend to run these on our site-hosted runners. They are currently running as dedicated runners, but this PR was created specifically to use an on-demand runner system that we have yet to fully deploy. I can convert this PR to use the dedicated runners that we currently have. We intend to support this despite the end of ECP funding.

mkstoyanov · 2024-08-23T18:07:09Z

@G-Ragghianti it would be of great help to support this, even if the tests don't span the entire range of supported GPUs and build configurations. I don't know if @ax3l has a time-frame in mind.

G-Ragghianti · 2024-08-23T18:14:03Z

I've updated the PR to use the currently available self-hosted runners that we have. There is still a small problem with FFTW module not being compatible with our module for openmpi. I will fix this, and then it can be merged.

ax3l · 2024-08-29T03:50:22Z

Yay! All green :)

G-Ragghianti added the draft label Mar 18, 2024

mkstoyanov approved these changes Mar 19, 2024

View reviewed changes

G-Ragghianti added 12 commits August 28, 2024 14:24

Adding CI scripts

9c7afb6

Fixing CI scripts

ea94f88

Switching to on-demand github runner system

6ea774d

Disabling rocm ci runs

2b56df7

Refining YAML syntax

1d956e7

CI fixes

1292015

Fixing spack invocation

ae91e4e

Switching CI to use gcc@11

75b7c65

Enabling ROCM

bc77f76

Updating github ci nodejs version

9dccbba

Disabling CI test for oneapi

d92d702

SKipping oneapi ci tests

bd0a7b9

G-Ragghianti added 5 commits August 28, 2024 14:24

Changing to use self-hosted runners

69f2d72

Using new github software env

436ded2

updating spack location

16bda83

update of CI config

e873644

CI changes

17894e8

G-Ragghianti force-pushed the gragghia/ci_oneapi branch from 450fd01 to 17894e8 Compare August 28, 2024 18:25

G-Ragghianti added 3 commits August 28, 2024 15:50

fixing CI spack install

d26c879

fixing spack build for CI

e307063

removing test for ROCM backend

cac727e

G-Ragghianti merged commit be7be23 into icl-utk-edu:master Aug 29, 2024
10 checks passed

G-Ragghianti removed the draft label Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding CI scripts #45

Adding CI scripts #45

G-Ragghianti commented Mar 18, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024

G-Ragghianti commented Apr 1, 2024

mkstoyanov commented Apr 1, 2024

ax3l commented Aug 23, 2024 •

edited

Loading

mkstoyanov commented Aug 23, 2024

G-Ragghianti commented Aug 23, 2024

mkstoyanov commented Aug 23, 2024

G-Ragghianti commented Aug 23, 2024

ax3l commented Aug 29, 2024

Adding CI scripts #45

Adding CI scripts #45

Conversation

G-Ragghianti commented Mar 18, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 19, 2024

mkstoyanov commented Mar 19, 2024

G-Ragghianti commented Mar 20, 2024

mkstoyanov commented Mar 20, 2024

G-Ragghianti commented Apr 1, 2024

mkstoyanov commented Apr 1, 2024

ax3l commented Aug 23, 2024 • edited Loading

mkstoyanov commented Aug 23, 2024

G-Ragghianti commented Aug 23, 2024

mkstoyanov commented Aug 23, 2024

G-Ragghianti commented Aug 23, 2024

ax3l commented Aug 29, 2024

ax3l commented Aug 23, 2024 •

edited

Loading