-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Build Failure when building from source #2546
Comments
which branch are you working on? spark has removed 3.1.2-SNAPSHOT libraries from snapshot maven repo (https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/), and going to release 3.1.2 shortly. To successfully build, please upmerge/rebase against upstream branch-21.06, thx~ |
Hi, thank you for the quick answer. I tried before using branch-21.06 but it lead to the following error:
|
which GPU device and cuda toolkit you have locally? branch-21.06 require cuda 11.0+ and driver 450.80.02+ |
Hey, cuda version:
GPU device:
|
Hi @tregodev, From the information provided above, you need to update your CUDA Drivers from 450.51.05 to 450.80.02+ in order to resolve the issue. For more information, please refer to the dependency list for libcudf here |
Hey all, thank you for the help. I updated the drivers However, the error still persists:
|
I tried running this locally but I can't get it to reproduce. Also our CI jobs seem to be OK. The P100 is Pascal and is supported. Your driver seems fine as well according to: https://docs.nvidia.com/deploy/cuda-compatibility/index.html. I assume you have CUDA runtime 11.0 or 11.2 installed, as 11.3 is not something we test with. Can you verify what runtime you are using? The runtime is normally installed in a path like /usr/local/cuda-11.0. I am sorry to ask for more input, but here are some suggestions:
Thanks for your patience. |
I just tested on a VM on azure using Standard_NC6s_v2 which has 1 P100. It builds fine if we skip the tests:
However if we do not skip the tests, I can reproduce the same:
together with another test failure:
Changing |
I also tested the same thing on a VM on azure using Standard_NC6s_v3 which has 1 V100. V100 runs fine.
|
Concerning the |
Hey, I can confirm that using @firestarman I did not find any cache files under @abellina I uploaded the surefire-reports grep contents to pastebin Exception and ERROR. Changing |
Hmm. I can confirm that all the CUDF window function tests seem to be running against a P100, with CUDA 11.2.2.
It is entirely possible that the specific |
The behaviour does seem quite strange. If all other tests except |
This is beginning to look like a memory corruption. On adding the
Note that the first three columns are turning up as nulls. These should have simply been projections from the input. Something's fishy. |
Note: At least part of the problem is that the tests do not order the output rows. It would be good to add ordering, perhaps based on But one or two tests (namely |
Please pardon the delay, I've been barking up the wrong tree. The window function implementation isn't the problem. The tests produced different results on Pascal, thanks to nondeterministic ordering. #2768 should sort this out. Tested on Turing and Pascal. |
Hey,
I am trying to build the main branch without any modifications to the code using
mvn verify
but unfortunately this fails due to this error
[ERROR] Failed to execute goal on project rapids-4-spark-shims-spark312_2.12: Could not resolve dependencies for project com.nvidia:rapids-4-spark-shims-spark312_2.12:jar:0.5.0: org.apache.spark:spark-sql_2.12:jar:3.1.2-SNAPSHOT was not found in https://oss.sonatype.org/content/repositories/snapshots during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of snapshots-repo has elapsed or updates are forced -> [
Is there any simple way to fix this?
Thanks!
The text was updated successfully, but these errors were encountered: