Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPCxBB query16 failed at UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary #1298

Closed
NvTimLiu opened this issue Dec 7, 2020 · 4 comments
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Dec 7, 2020

TPCxBB query16 failed at UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary

Pipeline link:
https://ci.ngcc.nvidia.com/job/spark/job/spark3.0_integration_k8s/579/console
https://ci.ngcc.nvidia.com/job/spark/job/spark3.0_integration_k8s/580/console

spark-submit --class com.nvidia.spark.examples.tpcxbb.Main
--conf spark.kubernetes.driver.pod.name=tpcxbb-q5-580-gpu --num-executors 8
--conf spark.executor.memory=50g --conf spark.driver.memory=10g
--conf spark.executor.cores=8 --conf spark.task.cpus=8
--conf spark.plugins=com.nvidia.spark.SQLPlugin
--conf spark.rapids.sql.concurrentGpuTasks=2
--conf spark.rapids.sql.incompatibleOps.enabled=true
--conf spark.executor.extraJavaOptions='-Dai.rapids.cudf.prefer-pinned=true -Dcom.nvidia.spark.rapids.semaphore.enabled=true'
--conf spark.executor.extraClassPath=local:///jars/*
--conf spark.driver.extraClassPath=local:///jars/*
--conf spark.task.resource.gpu.amount=1
--conf spark.rapids.memory.gpu.pooling.enabled=true
--conf spark.rapids.memory.pinnedPool.size=8g
--master k8s://https://spkperf.sjc4.nonprod-nvkong.com:443 --deploy-mode cluster --name k8s-tests
--conf spark.kubernetes.namespace=default
--conf spark.kubernetes.container.image=quay.io/nvidia/spark:k8s-ubuntu18-spark3-cuda10.1
--conf spark.kubernetes.container.image.pullSecrets=quayio-userpass
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.driverEnv.RAPIDS_XGB_EXAMPLE_OS_TYPE=K8s
--conf spark.kubernetes.driverEnv.RAPIDS_XGB_EXAMPLE_CUDA_VERSION=cuda10.1
--conf spark.kubernetes.driverEnv.RAPIDS_XGB_EXAMPLE_SPARK_VERSION=Spark-3.0.0
--conf spark.kubernetes.driverEnv.LIBCUDF_INCLUDE_DIR=/tmp/cudf-cache
--conf spark.kubernetes.submission.waitAppCompletion=false
--conf spark.kubernetes.driver.podTemplateFile=jenkins/pod-spark.yaml
--conf spark.kubernetes.executor.podTemplateFile=jenkins/pod-spark.yaml
--conf spark.executor.resource.gpu.vendor=nvidia.com
--conf spark.executor.resource.gpu.amount=1
--conf spark.executor.resource.gpu.discoveryScript=/opt/spark/examples/src/main/scripts/getGpusResources.sh
--conf spark.executorEnv.LIBCUDF_INCLUDE_DIR=/tmp/cudf-cache
--conf spark.hadoop.fs.s3a.access.key=spark --conf spark.hadoop.fs.s3a.secret.key=spark
--conf spark.hadoop.fs.s3a.endpoint=swiftstack-maglev.ngc.nvidia.com
--conf spark.hadoop.fs.s3a.path.style.access=true
--jars s3a://spark-data/jars/rapids-4-spark-integration-tests_2.12-0.3.0-SNAPSHOT.jar
s3a://spark-data/jars/tpcxbb_apps-0.2.2-SNAPSHOT.jar
--format=parquet --input=s3a://spark-data/data/tcpxbb-1TB-packed
--query=Q16 --output=./target/tmp --xpu=GPU

22:58:29 2020-12-06 14:58:13,949 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 21.0 (TID 31, 10.233.64.255, executor 4, partition 5, PROCESS_LOCAL, 7822 bytes)
22:58:29 2020-12-06 14:58:13,960 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 21.0 (TID 26, 10.233.64.255, executor 4): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
22:58:29 at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49)
22:58:29 at org.apache..sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:36)
22:58:29 at org.apache.
.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364)
22:58:29 at org.apache..sql.execution.vectorized.WritableColumnVector.getDecimal(WritableColumnVector.java:355)
22:58:29 at com.nvidia.
.rapids.HostColumnarToGpu$.$anonfun$columnarCopy$16(HostColumnarToGpu.scala:130)
22:58:29 at com.nvidia..rapids.HostColumnarToGpu$.$anonfun$columnarCopy$16$adapted(HostColumnarToGpu.scala:126)
22:58:29 at scala.collection.immutable.Range.foreach(Range.scala:158)
22:58:29 at com.nvidia.
.rapids.HostColumnarToGpu$.columnarCopy(HostColumnarToGpu.scala:126)
22:58:29 at com.nvidia..rapids.HostToGpuCoalesceIterator.$anonfun$addBatchToConcat$1(HostColumnarToGpu.scala:202)
22:58:29 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
22:58:29 at com.nvidia.
.rapids.HostToGpuCoalesceIterator.addBatchToConcat(HostColumnarToGpu.scala:200)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.addBatch(GpuCoalesceBatches.scala:337)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.$anonfun$next$1(GpuCoalesceBatches.scala:271)
22:58:29 at com.nvidia..rapids.Arm.withResource(Arm.scala:28)
22:58:29 at com.nvidia.
.rapids.Arm.withResource$(Arm.scala:26)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:134)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:257)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.next(GpuCoalesceBatches.scala:134)
22:58:29 at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.$anonfun$iterNext$1(GpuCoalesceBatches.scala:180)
22:58:29 at com.nvidia..rapids.Arm.withResource(Arm.scala:28)
22:58:29 at com.nvidia.
.rapids.Arm.withResource$(Arm.scala:26)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:134)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.iterNext(GpuCoalesceBatches.scala:179)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.$anonfun$hasNext$1(GpuCoalesceBatches.scala:185)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.$anonfun$hasNext$1$adapted(GpuCoalesceBatches.scala:183)
22:58:29 at com.nvidia..rapids.Arm.withResource(Arm.scala:28)
22:58:29 at com.nvidia.
.rapids.Arm.withResource$(Arm.scala:26)
22:58:29 at com.nvidia..rapids.AbstractGpuCoalesceIterator.withResource(GpuCoalesceBatches.scala:134)
22:58:29 at com.nvidia.
.rapids.AbstractGpuCoalesceIterator.hasNext(GpuCoalesceBatches.scala:183)
22:58:29 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
22:58:29 at com.nvidia..rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:189)
22:58:29 at com.nvidia.
.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:223)
22:58:29 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
22:58:29 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
22:58:29 at org.apache..shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
22:58:29 at org.apache.
.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
22:58:29 at org.apache..scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
22:58:29 at org.apache.
.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
22:58:29 at org.apache..scheduler.Task.run(Task.scala:127)
22:58:29 at org.apache.
.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
22:58:29 at org.apache..util.Utils$.tryWithSafeFinally(Utils.scala:1377)
22:58:29 at org.apache.
.executor.Executor$TaskRunner.run(Executor.scala:447)
22:58:29 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
22:58:29 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
22:58:29 at java.lang.Thread.run(Thread.java:748)
22:58:29

@NvTimLiu NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 7, 2020
@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Dec 8, 2020
@sameerz
Copy link
Collaborator

sameerz commented Dec 8, 2020

This should be fixed with #1278. Are recent tests failing?

@sameerz
Copy link
Collaborator

sameerz commented Dec 17, 2020

@NvTimLiu Looks like this is failing because of resources on the CI pipeline. Can you check?

cc: @GaryShen2008

@sameerz sameerz added this to the Jan 4 - Jan 15 milestone Dec 17, 2020
@jlowe
Copy link
Member

jlowe commented Dec 17, 2020

Besides disabling decimals by default, this should be fixed even when decimals are enabled by #1361 .

@NvTimLiu
Copy link
Collaborator Author

The UnsupportedOperationException failare is fixed, close this issue.

There is a new failure on the K8s xgboost train, I'll file a new bug for it

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
[auto-merge] bot-auto-merge-branch-23.08 to branch-23.10 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

No branches or pull requests

3 participants