[BUG] cudf_udf tests failed w/ 21.08 #2754

pxLi · 2021-06-21T07:44:52Z

libthrift.so version mismatched. tracking at rapidsai/cudf#8570

[2021-06-21T07:10:01.433Z] 21/06/21 07:10:18 WARN PythonWorkerFactory: Failed to open socket to Python daemon:
[2021-06-21T07:10:01.433Z] java.net.SocketException: Connection reset
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:210)
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
[2021-06-21T07:10:01.433Z] 	at java.net.SocketInputStream.read(SocketInputStream.java:224)
[2021-06-21T07:10:01.433Z] 	at java.io.DataInputStream.readInt(DataInputStream.java:387)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:210)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:226)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:225)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:119)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:192)
[2021-06-21T07:10:01.433Z] 	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:183)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.execution.python.PandasGroupUtils$.executePython(PandasGroupUtils.scala:44)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.execution.python.rapids.GpuPandasUtils$.executePython(GpuPandasUtils.scala:35)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.sql.rapids.execution.python.GpuFlatMapCoGroupsInPandasExec.$anonfun$doExecute$1(GpuFlatMapCoGroupsInPandasExec.scala:138)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:101)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.scheduler.Task.run(Task.scala:117)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:640)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
[2021-06-21T07:10:01.434Z] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:643)
[2021-06-21T07:10:01.434Z] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[2021-06-21T07:10:01.434Z] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[2021-06-21T07:10:01.434Z] 	at java.lang.Thread.run(Thread.java:748)
[2021-06-21T07:10:01.434Z] 21/06/21 07:10:18 WARN PythonWorkerFactory: Assuming that daemon unexpectedly quit, attempting to restart
[2021-06-21T07:10:01.688Z] INFO: Process 18925 found CUDA visible device(s): 0
[2021-06-21T07:10:03.043Z] Traceback (most recent call last):
[2021-06-21T07:10:03.043Z]   File "/home/ubuntu/spark-rapids/dist/target/rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar/rapids/daemon_databricks.py", line 132, in manager
[2021-06-21T07:10:03.043Z]   File "/home/ubuntu/spark-rapids/dist/target/rapids-4-spark_2.12-21.06.0-SNAPSHOT.jar/rapids/worker.py", line 37, in initialize_gpu_mem
[2021-06-21T07:10:03.043Z]     from cudf import rmm
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/__init__.py", line 76, in <module>
[2021-06-21T07:10:03.043Z]     from cudf.io import (
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/io/__init__.py", line 9, in <module>
[2021-06-21T07:10:03.043Z]     from cudf.io.parquet import (
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/cudf/io/parquet.py", line 8, in <module>
[2021-06-21T07:10:03.043Z]     from pyarrow import dataset as ds, parquet as pq
[2021-06-21T07:10:03.043Z]   File "/databricks/conda/envs/databricks-ml-gpu/lib/python3.7/site-packages/pyarrow/dataset.py", line 24, in <module>
[2021-06-21T07:10:03.043Z]     from pyarrow._dataset import (  # noqa
[2021-06-21T07:10:03.043Z] ImportError: libthrift.so.0.14.1: cannot open shared object file: No such file or directory

The text was updated successfully, but these errors were encountered:

pxLi · 2021-07-01T02:31:58Z

We saw new error after rapidsai/cudf#7495

conda create -n test1 -c rapidsai -c rapidsai-nightly -c nvidia -c conda-forge cudf=21.08 python=3.8 just now

install cudf 21.08 nightly using conda would includes the non-cuda build pyarrow dep
pyarrow 4.0.1 py38he0739d4_3
we are expecting
pyarrow 4.0.1 py38hb53058b_2_cuda
which could result in error like

ModuleNotFoundError: No module named 'pyarrow._cuda'
    from cudf import rmm
  File "/opt/conda/lib/python3.8/site-packages/cudf/__init__.py", line 11, in <module>
    from cudf import core, datasets, testing
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/__init__.py", line 3, in <module>
    from cudf.core import _internals, buffer, column, column_accessor, common
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/__init__.py", line 3, in <module>
    from cudf.core._internals.where import where
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/_internals/where.py", line 11, in <module>
    from cudf.core.column import ColumnBase
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/__init__.py", line 3, in <module>
    from cudf.core.column.categorical import CategoricalColumn
  File "/opt/conda/lib/python3.8/site-packages/cudf/core/column/categorical.py", line 25, in <module>
    from cudf import _lib as libcudf
  File "/opt/conda/lib/python3.8/site-packages/cudf/_lib/__init__.py", line 4, in <module>
    from . import (
ImportError: libarrow_cuda.so.400: cannot open shared object file: No such file or directory

pxLi · 2021-07-07T00:49:25Z

conda arrow dep fixed by rapidsai/cudf#8637 and rapidsai/cudf#8651

pxLi added bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf labels Jun 21, 2021

pxLi changed the title ~~[BUG] cudf_udf tests failed w/ cudf 21.06.1~~ [BUG] cudf_udf tests failed w/ cudf 21.06.01 Jun 21, 2021

pxLi changed the title ~~[BUG] cudf_udf tests failed w/ cudf 21.06.01~~ [BUG] cudf_udf tests failed w/ cudf 21.06.01/21.08 Jun 22, 2021

pxLi changed the title ~~[BUG] cudf_udf tests failed w/ cudf 21.06.01/21.08~~ [BUG] cudf_udf tests failed w/ 21.08 Jul 1, 2021

pxLi closed this as completed Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cudf_udf tests failed w/ 21.08 #2754

[BUG] cudf_udf tests failed w/ 21.08 #2754

pxLi commented Jun 21, 2021 •

edited

Loading

pxLi commented Jul 1, 2021 •

edited

Loading

pxLi commented Jul 7, 2021

[BUG] cudf_udf tests failed w/ 21.08 #2754

[BUG] cudf_udf tests failed w/ 21.08 #2754

Comments

pxLi commented Jun 21, 2021 • edited Loading

pxLi commented Jul 1, 2021 • edited Loading

pxLi commented Jul 7, 2021

pxLi commented Jun 21, 2021 •

edited

Loading

pxLi commented Jul 1, 2021 •

edited

Loading