Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Implement hasSideEffects for all expressions that have side-effects #5041

Closed
5 of 6 tasks
andygrove opened this issue Mar 24, 2022 · 1 comment
Closed
5 of 6 tasks
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@andygrove
Copy link
Contributor

andygrove commented Mar 24, 2022

Describe the bug
In #3849 we implemented special handling in CASE WHEN for expressions with side-effects but this relies on all expressions with side-effects overriding the hasSideEffects method and we have not done this for all expressions (see #5023 for one example).

We should audit all expressions and implement hasSideEffects where appropriate.

Candidates:

  • GpuGetArrayItem -
  • GpuElementAt
  • GpuGetMapValue
  • GpuUnaryMinus
  • GpuAbs
  • GpuSum

Steps/Code to reproduce bug
N/A

Expected behavior
Conditional expressions such as IF and CASE should not fail due to expressions with side-effects being evaluated and throwing exceptions.

Environment details (please complete the following information)
N/A

Additional context
N/A

@andygrove andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 24, 2022
@andygrove andygrove added this to the Mar 21 - Apr 1 milestone Mar 24, 2022
@andygrove andygrove self-assigned this Mar 24, 2022
@sameerz sameerz added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Mar 29, 2022
@revans2
Copy link
Collaborator

revans2 commented Apr 15, 2022

I just did a test with pure spark 3.2.0 and I can confirm that case/when does not stop a SUM in ansi from throwing an exception if the case/when is outside of the SUM.

ANSI DISABLED:

scala>  Seq((0L, Long.MaxValue), (0L, 100L), (1L, 100L)).toDF.groupBy("_1").agg(when(col("_1") === 1, sum("_2")).otherwise(0).alias("result")).show()
+---+------+
| _1|result|
+---+------+
|  0|     0|
|  1|   100|
+---+------+

ANSI ENABLED:

scala>  Seq((0L, Long.MaxValue), (0L, 100L), (1L, 100L)).toDF.groupBy("_1").agg(when(col("_1") === 1, sum("_2")).otherwise(0).alias("result")).show()
22/04/15 19:08:34 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID 29)
java.lang.ArithmeticException: long overflow
	at java.lang.Math.addExact(Math.java:809)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doConsume_0$(Unknown Source)
...

If I move the case/when inside the SUM, then it does the right thing in both places...

scala>  Seq((0L, Long.MaxValue), (0L, 100L), (1L, 100L)).toDF.groupBy("_1").agg(sum(when(col("_1") === 1, col("_2")).otherwise(0)).alias("result")).show()
+---+------+
| _1|result|
+---+------+
|  0|     0|
|  1|   100|
+---+------+

@revans2 revans2 closed this as completed Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

No branches or pull requests

3 participants