[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector #2378

zhnin · 2021-05-10T07:13:00Z

Describe the bug
Hi, when i runTPCx-BB Query7 and Query15 (SF1, use decimal) with

spark.rapids.sql.decimalType.enabled      true
spark.rapids.shuffle.compression.codec    lz4

I got a exception:

2021-05-10 11:26:20,143 ERROR executor.Executor: Exception in task 96.0 in stage 21.0 (TID 103)
java.lang.ClassCastException: com.nvidia.spark.rapids.GpuCompressedColumnVector cannot be cast to 
com.nvidia.spark.rapids.GpuColumnVector
        at com.nvidia.spark.rapids.GpuColumnVector.extractColumns(GpuColumnVector.java:839)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$loadNextBatch$2(GpuColumnarToRowExec.scala:203)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$loadNextBatch$2$adapted(GpuColumnarToRowExec.scala:201)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:177)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$loadNextBatch$1(GpuColumnarToRowExec.scala:201)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$loadNextBatch$1$adapted(GpuColumnarToRowExec.scala:200)
        at scala.Option.foreach(Option.scala:407)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:200)
        at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:238)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
        at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
        at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
        at com.nvidia.spark.rapids.RowToColumnarIterator.hasNext(GpuRowToColumnarExec.scala:564)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$iterHasNext$1(GpuCoalesceBatches.scala:175)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$iterHasNext$1$adapted(GpuCoalesceBatches.scala:174)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:133)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.iterHasNext(GpuCoalesceBatches.scala:174)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$hasNext$1(GpuCoalesceBatches.scala:183)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.$anonfun$hasNext$1$adapted(GpuCoalesceBatches.scala:182)
        at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
        at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:133)
        at com.nvidia.spark.rapids.AbstractGpuCoalesceIterator.hasNext(GpuCoalesceBatches.scala:182)
        at com.nvidia.spark.rapids.ConcatAndConsumeAll$.getSingleBatchWithVerification(GpuCoalesceBatches.scala:79)
        at com.nvidia.spark.rapids.GpuShuffledHashJoinBase.$anonfun$doExecuteColumnar$1(GpuShuffledHashJoinBase.scala:79)
        at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2021-05-10 11:26:20,175 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 146

Steps/Code to reproduce bug

TPCx-BB Query7 and Query15 (Data: SF1, use decimal)
spark-shell --master spark://master:7077

Environment details (please complete the following information)

Spark: 3.1.1
Driver&CUDA: 450.80.02&V11.0
Spark-rapids: rapids-4-spark_2.12-0.6.0-SNAPSHOT.jar 7c7832a
Cudf: cudf-0.20-SNAPSHOT-cuda11.jar
Spark configuration

spark.master                                        spark://master:7077
spark.plugins                                       com.nvidia.spark.SQLPlugin
spark.executor.resource.gpu.amount                  1
spark.driver.memory                                 4g
spark.executor.memory                               16g
spark.executor.cores                                5
spark.task.cpus                                     1
spark.task.resource.gpu.amount                      0.2
spark.shuffle.manager                               com.nvidia.spark.rapids.spark311.RapidsShuffleManager
spark.shuffle.service.enabled                       false
spark.executorEnv.UCX_TLS                           cuda_copy,cuda_ipc,tcp
spark.executorEnv.UCX_ERROR_SIGNALS
spark.executorEnv.UCX_MAX_RNDV_RAILS                1
spark.executorEnv.UCX_MEMTYPE_CACHE                 n
spark.executorEnv.UCX_RNDV_SCHEME                   put_zcopy
spark.executorEnv.UCX_CUDA_IPC_CACHE                y
spark.driver.extraClassPath                         /mnt/vdb/0.6/jars/*:/mnt/vdb/0.6/lib/ucx/lib
spark.executor.extraClassPath                       /mnt/vdb/0.6/jars/*:/mnt/vdb/0.6/lib/ucx/lib

spark.rapids.sql.decimalType.enabled                true
spark.rapids.shuffle.compression.codec              lz4

Additional context
Add any other context about the problem here.

If convert decimal to double, it could run.
If set spark.rapids.sql.decimalType.enabled=false , it could run.

The text was updated successfully, but these errors were encountered:

abellina · 2021-05-11T03:35:39Z

@zhnin thanks for the details in this issue. I'll try to reproduce this locally and get back to you.

abellina · 2021-05-11T21:22:11Z

@zhnin I am able to reproduce thanks to the well documented issue. Looking into a fix.

abellina · 2021-05-11T21:53:30Z

@andygrove suggested looking at this one rule, and removing it makes @zhnin's case pass.

@revans2 @jlowe @andygrove, we could either remove this, or perhaps test for compressed vectors in C2R. Any strong feelings?

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTransitionOverrides.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTransitionOverrides.scala
index 6e3ef94b..c99ee4cf 100644
--- a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTransitionOverrides.scala
+++ b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuTransitionOverrides.scala
@@ -170,10 +170,10 @@ class GpuTransitionOverrides extends Rule[SparkPlan] {
    *       not unusual.
    */
   def optimizeCoalesce(plan: SparkPlan): SparkPlan = plan match {
-    case c2r: GpuColumnarToRowExecParent if c2r.child.isInstanceOf[GpuCoalesceBatches] =>
-      // Don't build a batch if we are just going to go back to ROWS
-      val co = c2r.child.asInstanceOf[GpuCoalesceBatches]
-      c2r.withNewChildren(co.children.map(optimizeCoalesce))
+    //case c2r: GpuColumnarToRowExecParent if c2r.child.isInstanceOf[GpuCoalesceBatches] =>
+    //  // Don't build a batch if we are just going to go back to ROWS
+    //  val co = c2r.child.asInstanceOf[GpuCoalesceBatches]
+    //  c2r.withNewChildren(co.children.map(optimizeCoalesce))
     case GpuCoalesceBatches(r2c: GpuRowToColumnarExec, goal: TargetSize) =>
       // TODO in the future we should support this for all goals, but
       // GpuRowToColumnarExec preallocates all of the memory, and the builder does not

jlowe · 2021-05-11T22:17:50Z

I think there's basically three ways to tackle this:

Avoid "optimizing" a GPU coalesce if the preceding node is a shuffle
Have GpuColumnarToRow expect and handle compressed batches
Use a separate exec node to handle decompression separate from coalesce, i.e.: similar to the GpuShuffleCoalesceExec that is used for legacy shuffle

I think the first option is the best option, at least in the short term. Getting good decompression performance does require some batching which is what GpuCoalesceExec is already doing, and I'd rather not spread the knowledge and handling of compressed batches to something like GpuColumnarToRow. Having a separate exec for handling decompression could be a bit cleaner, but it's a more invasive change and will have a lot of overlap with coalesce exec since we want to build bigger batches for better decompression parallelism.

So my vote is to make the rule a bit smarter and not have it optimize a GpuCoalesceExec if its preceding node is a shuffle.

abellina · 2021-05-12T15:36:42Z

@zhnin could you try again with the latest changes in branch-0.6 to make sure it works for you?

zhnin · 2021-05-13T07:01:17Z

@zhnin could you try again with the latest changes in branch-0.6 to make sure it works for you?

Yes, i have tested, it works to me.

zhnin added ? - Needs Triage Need team to review and classify bug Something isn't working labels May 10, 2021

abellina self-assigned this May 11, 2021

sameerz removed the ? - Needs Triage Need team to review and classify label May 11, 2021

jlowe added the P0 Must have for release label May 11, 2021

abellina mentioned this issue May 11, 2021

Keep GpuCoalesceExec if a shuffle or custom reader is the parent exec #2396

Merged

jlowe closed this as completed in #2396 May 12, 2021

abellina mentioned this issue Jul 9, 2021

[BUG] GpuCompressedColumnVector cannot be cast to GpuColumnVector with AQE #2901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector #2378

[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector #2378

zhnin commented May 10, 2021

abellina commented May 11, 2021

abellina commented May 11, 2021

abellina commented May 11, 2021

jlowe commented May 11, 2021

abellina commented May 12, 2021

zhnin commented May 13, 2021

[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector #2378

[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector #2378

Comments

zhnin commented May 10, 2021

abellina commented May 11, 2021

abellina commented May 11, 2021

abellina commented May 11, 2021

jlowe commented May 11, 2021

abellina commented May 12, 2021

zhnin commented May 13, 2021