You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug java.lang.UnsatisfiedLinkError: /tmp/cudf6291133341730615510.so: libnvrtc.so.10.2: cannot open shared object file: No such file or directory error when running the Mortgage ETL notebook with the sample mortgage data on a GCP Dataproc cluster, when the master node is not a GPU node.
@tgravescs pointed out that this is avoidable if we set conf.set('spark.rapids.sql.exec.BroadcastExchangeExec', 'false'). However this notebook worked on Dataproc previously.
Expected behavior
GPU operations do not run on the master node if the master is CPU only.
Environment details (please complete the following information)
Environment location: GCP Dataproc preview-ubuntu18, running either CUDA 10.2 or CUDA 11.0
One note, there is an unrelated error in this notebook - that same cell in the notebook needs to start with from pyspark import SparkConf , but that will be addressed in a separate PR.
Additional context
This error occurs running the 0.2 plugin and 0.15 cudf. Something may have changed in the Dataproc environment.
The text was updated successfully, but these errors were encountered:
Looks like it's trying to build an empty GPU table on the driver so it can turn around and serialize out an empty table. Not sure how we didn't hit this before, as this code has been there a while. Looks like this might be triggered in a case where the broadcast table ends up being empty which I wouldn't expect in the mortgage query.
There might be an easy fix similar to what was done for rapidsai/cudf#5441, where we don't load native libs for HostColumnVector but instead defer the native libs loading to any class used by HostColumnVector that needs those libs. HostColumnVector doesn't have any native methods itself.
Describe the bug
java.lang.UnsatisfiedLinkError: /tmp/cudf6291133341730615510.so: libnvrtc.so.10.2: cannot open shared object file: No such file or directory
error when running the Mortgage ETL notebook with the sample mortgage data on a GCP Dataproc cluster, when the master node is not a GPU node.Error message in this gist: https://gist.github.com/sameerz/b1df1130b0955b562a5801d7eb664811
Steps/Code to reproduce bug
@tgravescs pointed out that this is avoidable if we set
conf.set('spark.rapids.sql.exec.BroadcastExchangeExec', 'false')
. However this notebook worked on Dataproc previously.Expected behavior
GPU operations do not run on the master node if the master is CPU only.
Environment details (please complete the following information)
One note, there is an unrelated error in this notebook - that same cell in the notebook needs to start with
from pyspark import SparkConf
, but that will be addressed in a separate PR.Additional context
This error occurs running the 0.2 plugin and 0.15 cudf. Something may have changed in the Dataproc environment.
The text was updated successfully, but these errors were encountered: