You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This same join used to work on older 0.3 versions and it works fine on the CPU side, seems like something recent broke this. This is running on databricks. I have not yet tried coming up with small way to reproduce.
20/11/13 18:53:17 ERROR Executor: Exception in task 0.2 in stage 21.0 (TID 574)
ai.rapids.cudf.CudfException: cuDF failure at: /ansible-managed/jenkins-slave/slave2/workspace/spark/cudf16_nightly/cpp/src/column/column_view.cpp:41: Column size cannot be negative.
at ai.rapids.cudf.Table.innerJoin(Native Method)
at ai.rapids.cudf.Table.access$3500(Table.java:36)
at ai.rapids.cudf.Table$TableOperation.innerJoin(Table.java:2105)
at com.nvidia.spark.rapids.shims.spark300db.GpuHashJoin.doJoinLeftRight(GpuHashJoin.scala:310)
at com.nvidia.spark.rapids.shims.spark300db.GpuHashJoin.com$nvidia$spark$rapids$shims$spark300db$GpuHashJoin$$doJoin(GpuHashJoin.scala:277)
at com.nvidia.spark.rapids.shims.spark300db.GpuHashJoin$$anon$1.hasNext(GpuHashJoin.scala:226)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at com.nvidia.spark.rapids.GpuHashAggregateExec.$anonfun$doExecuteColumnar$1(aggregate.scala:420)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:844)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:844)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:356)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:320)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:639)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:642)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The text was updated successfully, but these errors were encountered:
I went back to the 0.2 release where this query worked the last time I ran it and its not failing but with out of memory. Its possible the data changed. The error from 0.2 is:
java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: RMM failure at: /usr/local/rapids/include/rmm/mr/device/pool_memory_resource.hpp:167: Maximum pool size exceeded
at ai.rapids.cudf.Table.innerJoin(Native Method)
at ai.rapids.cudf.Table.access$3200(Table.java:35)
at ai.rapids.cudf.Table$TableOperation.innerJoin(Table.java:1986)
at com.nvidia.spark.rapids.shims.spark300db.GpuHashJoin.doJoinLeftRight(GpuHashJoin.scala:290)
Describe the bug
This same join used to work on older 0.3 versions and it works fine on the CPU side, seems like something recent broke this. This is running on databricks. I have not yet tried coming up with small way to reproduce.
The text was updated successfully, but these errors were encountered: