Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstate aarch64+CUDA; add ppc64le+CUDA #899

Merged
merged 11 commits into from
Apr 22, 2023

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Dec 4, 2022

A left-over from #875, this is blocked on the slow-moving1 conda-forge/conda-forge-ci-setup-feedstock#210 (more discussion on the why cross-compilation is a de facto necessity here).

Closes #859
Closes #659

Footnotes

  1. This might be further impacted by nvidia deps moving to conda-forge

@conda-forge-linter
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

After conda-forge/conda-forge-ci-setup-feedstock#210 got solved, this is now blocked on conda-forge/nvcc-feedstock#95

@isuruf
Copy link
Member

isuruf commented Apr 14, 2023

@conda-forge-admin, rerender

@h-vetinari
Copy link
Member Author

I'm rebasing this, but need to wait til conda-forge/conda-forge-pinning-feedstock#3624 makes it through the CDN

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@conda-forge conda-forge deleted a comment from conda-forge-webservices bot Apr 14, 2023
@h-vetinari
Copy link
Member Author

Ah, I hadn't seen conda-forge/nvcc-feedstock#96, that's amazing, thanks so much! 🤩

Seems this got through the resolver and now started building! 🥳

Failure is unrelated should be resolvable:

$PREFIX/bin/grpc_cpp_plugin: program not found or is not executable

@h-vetinari h-vetinari force-pushed the cross_cuda branch 2 times, most recently from a99c38f to 731c7b6 Compare April 14, 2023 08:23
@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 14, 2023

It seems this is looking for the wrong gnu stubs?

# include <gnu/stubs-32.h>
          ^~~~~~~~~~~~~~~~
1 error generated.

AFAICT they are named differently in our linux-sysroot? Probably needs a patch?

@h-vetinari h-vetinari mentioned this pull request Apr 14, 2023
@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 14, 2023

Thanks a lot for the help here Isuru! Checking out where the test failures are coming from in #1015. Sidenote, I saw that arrow wants to find libevent, got an open PR to make that happen (I had included it in host here before already, but rebased that out after it wasn't found even then - it needs the cmake config files to be found without further intervention)

@h-vetinari h-vetinari changed the title WIP: Reinstate aarch64+CUDA; add ppc64le+CUDA Reinstate aarch64+CUDA; add ppc64le+CUDA Apr 15, 2023
@h-vetinari
Copy link
Member Author

An interesting error while testing pyarrow on the new CUDA builds...

aarch:

export SRC_DIR=/home/conda/feedstock_root/build_artifacts/apache-arrow_1681520079529/test_tmp
$SRC_DIR/conda_test_runner.sh: $PREFIX/bin/python: /lib/ld-linux-aarch64.so.1: bad ELF interpreter: No such file or directory
Tests failed for pyarrow-11.0.0-py38h220f70f_14_cuda.conda - moving package to /home/conda/feedstock_root/build_artifacts/broken

ppc:

export SRC_DIR=/home/conda/feedstock_root/build_artifacts/apache-arrow_1681520086495/test_tmp
$SRC_DIR/conda_test_runner.sh: $PREFIX/bin/python: /lib64/ld64.so.2: bad ELF interpreter: No such file or directory
Tests failed for pyarrow-11.0.0-py311h6081924_14_cuda.conda - moving package to /home/conda/feedstock_root/build_artifacts/broken

@h-vetinari
Copy link
Member Author

An interesting error while testing pyarrow on the new CUDA builds...

The reason this is weird to me is that the error appears on bin/python invocation, and doesn't seem to have anything to do with arrow. It sounds sysroot-related, but I don't know these bits well enough to understand what's going on yet.

@isuruf
Copy link
Member

isuruf commented Apr 18, 2023

Should be fixed by conda-forge/docker-images#233

@h-vetinari
Copy link
Member Author

Should be fixed by conda-forge/docker-images#233

Ah, right, no QEMU in the new cross-images. Thanks a lot! :)

@h-vetinari
Copy link
Member Author

Something with $CUDA_HOME (or so?) seems to not yet be set up 100% correctly:

import: 'pyarrow.cuda'
Traceback (most recent call last):
  File "/home/conda/feedstock_root/build_artifacts/apache-arrow_1681808893494/test_tmp/run_test.py", line 29, in <module>
    import pyarrow.cuda
  File "$PREFIX/lib/python3.9/site-packages/pyarrow/cuda.py", line 21, in <module>
    from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 18, 2023

Should be fixed by conda-forge/conda-forge-ci-setup-feedstock#234

@h-vetinari
Copy link
Member Author

Cross-compiling CUDA ✅🥳

This PR is now ready for primetime, and I couldn't be more thrilled that we finally got this working. 🤩

Context

Before arrow 10.x, each python version had a separate CI job, which was already regularly running into the 6h time limit when running in emulation (I was constantly restarting these jobs after merging to the 9.0.x and 8.0.x branches, taking up to 13(!) runs to pass). This was also why we never added support for CUDA-on-ppc, because it would have made that restarting pain many times worse still.

With arrow 10, libarrow finally became independent of the python version; allowing us to build it only once, and collapse the huge build matrix1 to do all python versions in one job per arch. That was a big benefit, but made it impossible to compile CUDA support in emulation, so we ended up dropping support for CUDA-on-aarch at the time, pending cross-compilation capabilities.

IOW, we can now add back CUDA-on-aarch, and add support for CUDA-on-ppc!

I've also prepared backports of this PR for all our maintenance branches (10.0.x, 9.0.x, 8.0.x). Since we had dropped CUDA-on-aarch for arrow 10, backporting there should be obvious, and since cross-compilation hugely cuts down our CI times (and gets rid of the necessity to restart!) on the 9.0.x and 8.0.x branches, it should be a no-brainer there as well2. In order to get green CI, those PRs also fix #1016 on all branches.

PTAL @conda-forge/arrow-cpp
CC @conda-forge/core (just for the good news 😊)

PS: It took quite a bit of effort to get this working (wouldn't have happened without @isuruf; a huge thanks! 🙏), here's some indication of what was necessary (no claim of exhaustiveness; just what I'm aware of)...

Footnotes

  1. 72 jobs at a time when we still had to build for both OpenSSL 1.1.1 & 3.0; would have been 80 had we included CUDA-on-ppc.

  2. An argument could be made for leaving the "feature" of CUDA-on-ppc out of the maintenance branches, but since it costs ~nothing and gets us consistency across all branches, I think we should do it.

@h-vetinari
Copy link
Member Author

Seems people are less excited about this than I am? 🤔 😅
(but then, I guess it's me benefitting the most from not having to restart CI all the time)

Unless there are comments to the contrary, I'll merge this (and the maintenance PRs) in 24h.

recipe/build-arrow.sh Show resolved Hide resolved
recipe/meta.yaml Show resolved Hide resolved
@h-vetinari h-vetinari merged commit f967112 into conda-forge:main Apr 22, 2023
@h-vetinari h-vetinari deleted the cross_cuda branch April 22, 2023 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants