Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cudf_udf tests failed w/ 21.08 #2754

Closed
pxLi opened this issue Jun 21, 2021 · 2 comments
Closed

[BUG] cudf_udf tests failed w/ 21.08 #2754

pxLi opened this issue Jun 21, 2021 · 2 comments
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf

Comments

@pxLi
Copy link
Collaborator

pxLi commented Jun 21, 2021

libthrift.so version mismatched. tracking at rapidsai/cudf#8570

[2021-06-21T07:10:01.433Z] 21/06/21 07:10:18 WARN PythonWorkerFactory: Failed to open socket to Python daemon:
[2021-06-21T07:10:01.433Z] java.net.SocketException: Connection reset
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:210)
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:224)
[2021-06-21T07:10:01.433Z] 	at java.io.DataInputStream.readInt(DataInputStream.java:387)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:210)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:226)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:225)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:119)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:192)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:183)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.execution.python.PandasGroupUtils$.executePython(PandasGroupUtils.scala:44)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.execution.python.rapids.GpuPandasUtils$.executePython(GpuPandasUtils.scala:35)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.rapids.execution.python.GpuFlatMapCoGroupsInPandasExec.$anonfun$doExecute$1(GpuFlatMapCoGroupsInPandasExec.scala:138)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:101)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.Task.run(Task.scala:117)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)
[2021-06-21T07:10:01.434Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-06-21T07:10:01.434Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-06-21T07:10:01.434Z] 	at java.lang.Thread.run(Thread.java:748)
[2021-06-21T07:10:01.434Z] 21/06/21 07:10:18 WARN PythonWorkerFactory: Assuming that daemon unexpectedly quit, attempting to restart
[2021-06-21T07:10:01.688Z] INFO: Process 18925 found CUDA visible device(s): 0
[2021-06-21T07:10:03.043Z] Traceback (most recent call last):
[2021-06-21T07:10:03.043Z]   File "/home/ubuntu/spark-rapids/dist/target/rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar/rapids/daemon_databricks.py", line 132, in manager
[2021-06-21T07:10:03.043Z]   File "/home/ubuntu/spark-rapids/dist/target/rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar/rapids/worker.py", line 37, in initialize_gpu_mem
[2021-06-21T07:10:03.043Z]     from cudf import rmm
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/__init__.py", line 76, in <module>
[2021-06-21T07:10:03.043Z]     from cudf.io import (
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/io/__init__.py", line 9, in <module>
[2021-06-21T07:10:03.043Z]     from cudf.io.parquet import (
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/io/parquet.py", line 8, in <module>
[2021-06-21T07:10:03.043Z]     from pyarrow import dataset as ds, parquet as pq
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/pyarrow/dataset.py", line 24, in <module>
[2021-06-21T07:10:03.043Z]     from pyarrow._dataset import (  # noqa
[2021-06-21T07:10:03.043Z] ImportError: libthrift.so.0.14.1: cannot open shared object file: No such file or directory
@pxLi pxLi added bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Jun 21, 2021
@pxLi pxLi changed the title [BUG] cudf_udf tests failed w/ cudf 21.06.1 [BUG] cudf_udf tests failed w/ cudf 21.06.01 Jun 21, 2021
@pxLi pxLi changed the title [BUG] cudf_udf tests failed w/ cudf 21.06.01 [BUG] cudf_udf tests failed w/ cudf 21.06.01/21.08 Jun 22, 2021
@pxLi pxLi changed the title [BUG] cudf_udf tests failed w/ cudf 21.06.01/21.08 [BUG] cudf_udf tests failed w/ 21.08 Jul 1, 2021
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 1, 2021

We saw new error after rapidsai/cudf#7495

conda create -n test1 -c rapidsai -c rapidsai-nightly -c nvidia -c conda-forge cudf=21.08 python=3.8 just now
image

install cudf 21.08 nightly using conda would includes the non-cuda build pyarrow dep
pyarrow 4.0.1 py38he0739d4_3
we are expecting
pyarrow 4.0.1 py38hb53058b_2_cuda
which could result in error like

ModuleNotFoundError: No module named 'pyarrow._cuda'
    from cudf import rmm
  File "/opt/conda/lib/python3.8/site-packages/cudf/__init__.py", line 11, in <module>
    from cudf import core, datasets, testing
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/__init__.py", line 3, in <module>
    from cudf.core import _internals, buffer, column, column_accessor, common
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/__init__.py", line 3, in <module>
    from cudf.core._internals.where import where
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/where.py", line 11, in <module>
    from cudf.core.column import ColumnBase
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/__init__.py", line 3, in <module>
    from cudf.core.column.categorical import CategoricalColumn
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/categorical.py", line 25, in <module>
    from cudf import _lib as libcudf
  File "/opt/conda/lib/python3.8/site-packages/cudf/_lib/__init__.py", line 4, in <module>
    from . import (
ImportError: libarrow_cuda.so.400: cannot open shared object file: No such file or directory

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 7, 2021

conda arrow dep fixed by rapidsai/cudf#8637 and rapidsai/cudf#8651

@pxLi pxLi closed this as completed Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
Development

No branches or pull requests

1 participant