Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] create_map failed with java.lang.IllegalStateException: This is not supported yet #5180

Closed
viadea opened this issue Apr 8, 2022 · 7 comments · Fixed by #5184
Closed
Labels
bug Something isn't working P0 Must have for release

Comments

@viadea
Copy link
Collaborator

viadea commented Apr 8, 2022

Below is the repro:

from itertools import chain
from pyspark.sql.functions import create_map, lit

simple_dict = {'india':'ind', 'usa':'us', 'japan':'jpn', 'uruguay':'urg'}

mapping_expr = create_map([lit(x) for x in chain(*simple_dict.items())])

df = sc.parallelize([('india','japan'),('usa','uruguay')]).toDF(['col1','col2'])

df = df.withColumn('col1_map', mapping_expr[df['col1']])\
       .withColumn('col2_map', mapping_expr[df['col2']])

df.show(truncate=False)

Error:

Py4JJavaError: An error occurred while calling o519.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 18.0 failed 4 times, most recent failure: Lost task 2.3 in stage 18.0 (TID 47) (10.13.1.8 executor 0): java.lang.IllegalStateException: This is not supported yet
	at org.apache.spark.sql.rapids.GpuGetMapValue.doColumnar(complexTypeExtractors.scala:257)
	at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:258)
	at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
	at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
	at org.apache.spark.sql.rapids.GpuGetMapValue.withResourceIfAllowed(complexTypeExtractors.scala:216)
	at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:253)
	at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73)
	at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71)
	at org.apache.spark.sql.rapids.GpuGetMapValue.withResourceIfAllowed(complexTypeExtractors.scala:216)
	at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval(GpuExpressions.scala:252)
	at com.nvidia.spark.rapids.GpuBinaryExpression.columnarEval$(GpuExpressions.scala:251)
	at org.apache.spark.sql.rapids.GpuGetMapValue.columnarEval(complexTypeExtractors.scala:216)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
	at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:109)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
	at com.nvidia.spark.rapids.GpuExpressionsUtils$.columnarEvalToColumn(GpuExpressions.scala:93)
	at com.nvidia.spark.rapids.GpuProjectExec$.projectSingle(basicPhysicalOperators.scala:102)
	at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:109)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:162)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:159)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:159)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:194)
	at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:109)
	at com.nvidia.spark.rapids.GpuProjectExec$.projectAndClose(basicPhysicalOperators.scala:73)
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$doExecuteColumnar$1(basicPhysicalOperators.scala:149)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$2(GpuColumnarToRowExec.scala:242)
	at com.nvidia.spark.rapids.Arm.withResource(Arm.scala:28)
	at com.nvidia.spark.rapids.Arm.withResource$(Arm.scala:26)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.withResource(GpuColumnarToRowExec.scala:188)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:239)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:216)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:256)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1670)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

CPU Spark(Expected results):

+-----+-------+--------+--------+
|col1 |col2   |col1_map|col2_map|
+-----+-------+--------+--------+
|india|japan  |ind     |jpn     |
|usa  |uruguay|us      |urg     |
+-----+-------+--------+--------+

Env:
22.04 snapshot jars

Is this a current limitation of create_map or a bug?
If it is a limitation, could we gracefully fallback to CPU?

@viadea viadea added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 8, 2022
@mythrocks
Copy link
Collaborator

mythrocks commented Apr 8, 2022

As has been the case since the GetMapValue was first added, the plugin requires that the keys be scalars, not a vector as here. This is the error manifesting here. #4944 only changed the scope of the data types, not scalar vs vector.
(Note to self: That error message is not very useful.)

I'm checking if the GpuOverrides changes in #4944 messed up the CPU fallback.

@mythrocks
Copy link
Collaborator

Thank you for the clear repro case, by the way.

@mythrocks
Copy link
Collaborator

Yep, I have confirmed that this is a regression. Apologies.

I'm looking at what it will take to fix the problem. Stand by.

@mythrocks
Copy link
Collaborator

I'm looking at what it will take to fix the problem. Stand by.

I think I've found the problem. I'm testing a possible fix.

mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Apr 8, 2022
Fixes NVIDIA#5180.

Map lookup is currently supported only in cases where the keys are
scalar values. In case the keys are specified as a vector (e.g.
expressions), the plugin should fall back to CPU.
NVIDIA#4944 introduced a bug in how literal signatures are specified for
multiple data types. This breaks CPU fallback.

This commit fixes the specification of literals-only `TypeSig`.
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Apr 8, 2022
Fixes NVIDIA#5180.

Map lookup is currently supported only in cases where the keys are
scalar values. In case the keys are specified as a vector (e.g.
expressions), the plugin should fall back to CPU.
multiple data types. This breaks CPU fallback.

This commit fixes the specification of literals-only `TypeSig`.

Signed-off-by: MithunR <mythrocks@gmail.com>
@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Apr 8, 2022
@sameerz sameerz added this to the Apr 4 - Apr 15 milestone Apr 8, 2022
@mythrocks
Copy link
Collaborator

@viadea: We have sorted out the CPU fallback for MAP lookup on non-scalar keys. This should now fall back gracefully.
I have also filed #5204, to support key-vectors i.e. expressions for keys in map lookup. I'm not sure what the priority of the feature might be. It would depend on whether there are users currently attempting to access map values with expression keys. This will require some CUDF support.
Would you prefer that we keep this bug open until #5204 is resolved?

@pxLi pxLi closed this as completed in cb8f79d Apr 12, 2022
@pxLi pxLi reopened this Apr 12, 2022
@pxLi
Copy link
Collaborator

pxLi commented Apr 12, 2022

reopen auto-closed issue. need to confirm if we can close this or retarget to 22.06

@GaryShen2008
Copy link
Collaborator

Confirmed with @viadea , let's close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants