Skip to content

Commit

Permalink
bump NCCL floor to 2.18.1.1, relax PyTorch pin (#218)
Browse files Browse the repository at this point in the history
Contributes to rapidsai/build-planning#102

Fixes #217

## Notes for Reviewers

### How I tested this

Temporarily added a CUDA 11.4.3 test job to CI here (the same specs as the failing nightly), by pointing at the branch from rapidsai/shared-workflows#246.

Observed the exact same failures with CUDA 11.4 reported in rapidsai/build-planning#102.

```text
...
  + nccl                     2.10.3.1  hcad2f07_0                  rapidsai-nightly     125MB
...
./WHOLEGRAPH_CSR_WEIGHTED_SAMPLE_WITHOUT_REPLACEMENT_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit
sh -c exec "$0" ./WHOLEMEMORY_HANDLE_TEST 
./WHOLEMEMORY_HANDLE_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit
sh -c exec "$0" ./GRAPH_APPEND_UNIQUE_TEST 
```

([build link](https://github.com/rapidsai/wholegraph/actions/runs/10966022370/job/30453393224?pr=218))

Pushed a commit adding a floor of `nccl>=2.18.1.1`. Saw all tests pass with CUDA 11.4 😁 

```text
...
  + nccl                     2.22.3.1  hee583db_1                  conda-forge          131MB
...
(various log messages showing all tests passed)
```

([build link](https://github.com/rapidsai/wholegraph/actions/runs/10966210441/job/30454147250?pr=218))

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - https://github.com/linhu-nv
  - https://github.com/jakirkham

URL: #218
  • Loading branch information
jameslamb authored Sep 25, 2024
1 parent 09e90be commit 73266e2
Show file tree
Hide file tree
Showing 4 changed files with 10 additions and 10 deletions.
4 changes: 2 additions & 2 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ dependencies:
- librmm==24.10.*,>=0.0.0a0
- nanobind>=0.2.0
- nbsphinx
- nccl
- nccl>=2.18.1.1
- ninja
- numpy>=1.23,<3.0a0
- numpydoc
Expand All @@ -40,7 +40,7 @@ dependencies:
- pytest-xdist
- python>=3.10,<3.13
- pytorch-cuda=11.8
- pytorch=2.0.0
- pytorch>=2.0,<2.4.0a0
- rapids-build-backend>=0.3.0,<0.4.0.dev0
- recommonmark
- scikit-build-core>=0.10.0
Expand Down
2 changes: 1 addition & 1 deletion conda/environments/all_cuda-125_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ dependencies:
- librmm==24.10.*,>=0.0.0a0
- nanobind>=0.2.0
- nbsphinx
- nccl
- nccl>=2.18.1.1
- ninja
- numpy>=1.23,<3.0a0
- numpydoc
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/libwholegraph/conda_build_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ doxygen_version:
- ">=1.8.11"

nccl_version:
- ">=2.9.9"
- ">=2.18.1.1"

c_stdlib:
- sysroot
Expand Down
12 changes: 6 additions & 6 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ dependencies:
- libraft-headers==24.10.*,>=0.0.0a0
- librmm==24.10.*,>=0.0.0a0
- nanobind>=0.2.0
- nccl
- &nccl nccl>=2.18.1.1
specific:
- output_types: conda
matrices:
Expand Down Expand Up @@ -216,14 +216,14 @@ dependencies:
common:
- output_types: [conda]
packages:
- nccl
- *nccl
test_python:
common:
- output_types: [conda]
packages:
- c-compiler
- cxx-compiler
- nccl
- *nccl
- output_types: [conda, requirements]
packages:
- ninja
Expand Down Expand Up @@ -285,13 +285,13 @@ dependencies:
# If conda-forge supports the new cuda-* packages for CUDA 11.8
# at some point, then we can fully support/properly specify
# this environment.
- pytorch=2.0.0
- &pytorch pytorch>=2.0,<2.4.0a0
- pytorch-cuda=11.8
- matrix:
arch: aarch64
cuda: "11.8"
packages:
- pytorch=2.0.0
- *pytorch
- pytorch-cuda=11.8
- matrix:
packages:
Expand All @@ -318,7 +318,7 @@ dependencies:
common:
- output_types: [conda]
packages:
- pytorch=2.0.0
- *pytorch
- cpuonly
clang_tools:
common:
Expand Down

0 comments on commit 73266e2

Please sign in to comment.