Fix Spark UT issues in RapidsDataFrameAggregateSuite #10943

thirtiseven · 2024-05-29T08:35:52Z

closes #10772

issues in RapidsDataFrameAggregateSuite are not real bugs (Thanks @abellina )

This pr:

ported 4 cases related to collect and add sort_array around them, because order of elements in the array is non-deterministic in collect_list and collect_set.
marked "SPARK-19471: AggregationIterator does not initialize the generated result projection before using it as won't fix issue because it is a Codegen related UT to find WholeStageCodegenExec in gpu plan.
removed SPARK-24788: RelationalGroupedDataset.toString with unresolved exprs should not fail because it can pass now.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

jlowe · 2024-05-31T14:06:42Z

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/BackendTestSettings.scala

@@ -83,6 +83,7 @@ abstract class BackendTestSettings {
  // or a description like "This simply can't work on GPU".
  // It should never be "unknown" or "need investigation"
  case class KNOWN_ISSUE(reason: String) extends ExcludeReason
+  case class ADJUST_UT(reason: String) extends ExcludeReason


What's the purpose of this vs. KNOWN_ISSUE? Are we intending to fix these? If so, we should file a tracking issue and use KNOWN_ISSUE with the issue URL as the description. If we're not intending to fix these, why not use KNOWN_ISSUE with the same description?

It is to mark the test cases that Spark UT cases have issues themselves, something is wrong or not working for plugin. But we still want to test what it meant to test by adjusting the case. Maybe a better name would be INVALID_CASE or SPARK_UT_ISSUE?

cc @binmahone

I agree KNOWN_ISSUE is good enough. Why not file an issue for tolerating non-determinism by sort and reference it?

ADJUST_UT in our context means "the spark test case will work for rapids, there is no bug in rapids, but the test case itself needs some modification". For example, if a spark test case looks like:

test("testcase1"){ val x = spark.sql("select sum(x), y from testdata group by y") assert(x.at(0) == ('x0', 100)) assert(x.at(1) == ('x1', 200)) }

Notice the operations and assertions are hard coded in the test case.
We know that Rapids may return results in a different order so the test case will fail for us.
From framework perspective, we have no chance to ask it to do any sort before asserting results.
However, the test case is still meaningful and we should enable it to increase test coverage.

This is where ADJUST_UT can help. we can adjust the above test case to a new one (and at the same time exclude the old test case with reason ADJUST_UT):

test("NEW testcase1"){ val x = spark.sql("select sum(x), y from testdata group by y") // sort x to make the result deterministic ... // make assertions based on the deterministic result ... }

By doing this, the test case testcase1 is considered to be "solved", there will be NO follow up issue, so it's not a known issue.

Based on our experience on Gluten, this type of case is very common, So I think it's necessary to add the ADJUST_UT enum.

What you think? @jlowe @gerashegalov

I am ok with this, I like the ADJUST_UT label. We can always go back and look at all the tests we adjusted (audit them).

I am ok with this, I like the ADJUST_UT label. We can always go back and look at all the tests we adjusted (audit them).

thx Alessandro

I see this makes sense to me. We can go with this for this PR

I wonder if we define test("testcase1") in our code, does it override the test in the base class? If it is possible then we could just do the override and not have a special ADJUST_UT exclude tag.

Currently the framework does not have this feature (everything has to be explicit now). But I agree with you that this is a good idea. @HaoYang670 please raise a framework feature request if you also see potential value of this.

Another idea for collect_list kind of issues with SQL, we could probably register our own UDAF as collect_list which will either be a simple delegate to the real collect_list , collect_list followed by sort otherwise.

Yeah, this would be the test case where ADJUST_UT is NOT finally needed. However extra sort will be a performance hurt. In my point of view, we can tolerate minor different result with Vainilla Spark, as long as both are correct answer under ANSI SQL standard. Do you think our team can reach concenses on this? @gerashegalov

cc @thirtiseven

However extra sort will be a performance hurt. In my point of view, we can tolerate minor different result with Vainilla Spark, as long as both are correct answer under ANSI SQL standard.

What I mean is overriding collect_list only in the test code, injecting the sort only in specially tagged tests where we know that that the order variance is permissible. We don't need to pay unnecessary performance penalty in the production code. I am bringing this up as an idea to discuss for follow-up work. This PR is fine

hi @gerashegalov I can roughly imagine what you're suggesting. Still it will be good if you can bring it up as a formal proposal (for further discussion). For now, I'm not quite sure to what extent can your idea solve the inconsistent problem.

GaryShen2008 · 2024-06-07T03:30:50Z

build

firestarman · 2024-06-07T06:58:39Z

build

GaryShen2008 · 2024-06-07T09:50:34Z

build

* Fix Spark UT issues in RapidsDataFrameAggregateSuite Signed-off-by: Haoyang Li <haoyangl@nvidia.com> * Added SPARK-24788 back Signed-off-by: Haoyang Li <haoyangl@nvidia.com> --------- Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Fix Spark UT issues in RapidsDataFrameAggregateSuite

74d79cf

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven requested review from abellina and binmahone May 29, 2024 08:35

thirtiseven self-assigned this May 29, 2024

sameerz added the test Only impacts tests label May 29, 2024

GaryShen2008 reviewed May 29, 2024

View reviewed changes

tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Show resolved Hide resolved

thirtiseven changed the base branch from branch-24.06 to branch-24.08 May 29, 2024 12:32

Added SPARK-24788 back

2c05be1

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

jlowe reviewed May 31, 2024

View reviewed changes

abellina approved these changes Jun 5, 2024

View reviewed changes

gerashegalov approved these changes Jun 6, 2024

View reviewed changes

thirtiseven merged commit 18c2579 into NVIDIA:branch-24.08 Jun 8, 2024
44 checks passed

NvTimLiu mentioned this pull request Aug 6, 2024

Update changelog for v24.08.0 release [skip ci] #11304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Spark UT issues in RapidsDataFrameAggregateSuite #10943

Fix Spark UT issues in RapidsDataFrameAggregateSuite #10943

thirtiseven commented May 29, 2024 •

edited

Loading

jlowe May 31, 2024

thirtiseven Jun 5, 2024 •

edited

Loading

gerashegalov Jun 6, 2024

binmahone Jun 6, 2024 •

edited

Loading

abellina Jun 6, 2024

binmahone Jun 7, 2024

binmahone Jun 7, 2024

HaoYang670 Jun 7, 2024

gerashegalov Jun 7, 2024 •

edited

Loading

binmahone Jun 11, 2024

GaryShen2008 commented Jun 7, 2024

firestarman commented Jun 7, 2024

GaryShen2008 commented Jun 7, 2024

Fix Spark UT issues in RapidsDataFrameAggregateSuite #10943

Fix Spark UT issues in RapidsDataFrameAggregateSuite #10943

Conversation

thirtiseven commented May 29, 2024 • edited Loading

jlowe May 31, 2024

Choose a reason for hiding this comment

thirtiseven Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

gerashegalov Jun 6, 2024

Choose a reason for hiding this comment

binmahone Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

abellina Jun 6, 2024

Choose a reason for hiding this comment

binmahone Jun 7, 2024

Choose a reason for hiding this comment

binmahone Jun 7, 2024

Choose a reason for hiding this comment

HaoYang670 Jun 7, 2024

Choose a reason for hiding this comment

gerashegalov Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

binmahone Jun 11, 2024

Choose a reason for hiding this comment

GaryShen2008 commented Jun 7, 2024

firestarman commented Jun 7, 2024

GaryShen2008 commented Jun 7, 2024

thirtiseven commented May 29, 2024 •

edited

Loading

thirtiseven Jun 5, 2024 •

edited

Loading

binmahone Jun 6, 2024 •

edited

Loading

gerashegalov Jun 7, 2024 •

edited

Loading