[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) #10182

andygrove · 2024-01-10T23:54:58Z

Describe the bug

CI build for #10179 failed with unrelated dpp errors. Seems similar to #10147

2024-01-10T23:39:01.0960149Z [2024-01-10T23:34:21.452Z] [2024-01-10T23:33:07.595Z] ^[[31mFAILED^[[0m ../../src/main/python/dpp_test.py::^[[1mtest_dpp_bypass[true-5-parquet][DATAGEN_SEED=1704919540, INJECT_OOM, IGNORE_ORDER]^[[0m - py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.s...
2024-01-10T23:39:01.0961556Z [2024-01-10T23:34:21.452Z] [2024-01-10T23:33:07.595Z] ^[[31mFAILED^[[0m ../../src/main/python/dpp_test.py::^[[1mtest_dpp_bypass[true-5-orc][DATAGEN_SEED=1704919540, IGNORE_ORDER]^[[0m - py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.s...
2024-01-10T23:39:01.0963105Z [2024-01-10T23:34:21.452Z] [2024-01-10T23:33:07.595Z] ^[[31mFAILED^[[0m ../../src/main/python/dpp_test.py::^[[1mtest_dpp_via_aggregate_subquery[true-5-parquet][DATAGEN_SEED=1704919540, INJECT_OOM, IGNORE_ORDER]^[[0m - py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.s...
2024-01-10T23:39:01.0964581Z [2024-01-10T23:34:21.452Z] [2024-01-10T23:33:07.595Z] ^[[31mFAILED^[[0m ../../src/main/python/dpp_test.py::^[[1mtest_dpp_via_aggregate_subquery[true-5-orc][DATAGEN_SEED=1704919540, IGNORE_ORDER]^[[0m - py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.s...

java.lang.AssertionError: assertion failed: Could not find DynamicPruningExpression in the **** plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
  ResultQueryStage 4, Statistics(sizeInBytes=0.0 B, ColumnStat: N/A)
  +- LocalTableScan <empty>, [key#109529, max(value)#109545L]
+- == Initial Plan ==
  Sort [key#109529 ASC NULLS FIRST, max(value)#109545L ASC NULLS FIRST], true, 0
  +- Exchange rangepartitioning(key#109529 ASC NULLS FIRST, max(value)#109545L ASC NULLS FIRST, 4), ENSURE_REQUIREMENTS, [plan_id=190097]
     +- HashAggregate(keys=[key#109529], functions=[finalmerge_max(merge max#109595L) AS max(value#109530L)#109544L], output=[key#109529, max(value)#109545L])
        +- Exchange hashpartitioning(key#109529, 4), ENSURE_REQUIREMENTS, [plan_id=190095]
           +- HashAggregate(keys=[key#109529], functions=[partial_max(value#109530L) AS max#109595L], output=[key#109529, max#109595L])
              +- Union
                 :- Project [key#109438 AS key#109529, value#109534L AS value#109530L]
                 :  +- BroadcastHashJoin [key#109438], [key#109440], Inner, BuildRight, false
                 :     :- HashAggregate(keys=[key#109438], functions=[finalmerge_sum(merge sum#109597L) AS sum(value#109437)#109538L], output=[key#109438, value#109534L])
                 :     :  +- Exchange hashpartitioning(key#109438, 4), ENSURE_REQUIREMENTS, [plan_id=190081]
                 :     :     +- HashAggregate(keys=[key#109438], functions=[partial_sum(value#109437) AS sum#109597L], output=[key#109438, sum#109597L])
                 :     :        +- Project [value#109437, key#109438]
                 :     :           +- Filter (isnotnull(value#109437) AND (value#109437 > 0))
                 :     :              +- FileScan orc spark_catalog.default.tmp_table_gw1_17637801_0[value#109437,key#109438,skey#109439] Batched: true, DataFilters: [isnotnull(value#109437), (value#109437 > 0)], Format: ORC, Location: InMemoryFileIndex(50 paths)[file:/home/ubuntu/spark-rapids/integration_tests/target/run_dir-20240..., PartitionFilters: [isnotnull(key#109438), dynamicpruning#109591 109590], PushedFilters: [IsNotNull(value), GreaterThan(value,0)], ReadSchema: struct<value:int>
                 :     +- Exchange SinglePartition, EXECUTOR_BROADCAST, [plan_id=190060]
                 :        +- Project [key#109440]
                 :           +- Filter ((((isnotnull(ex_key#109442) AND isnotnull(filter#109444)) AND (ex_key#109442 = 3)) AND (filter#109444 = 160)) AND isnotnull(key#109440))
                 :              +- FileScan orc spark_catalog.default.tmp_table_gw1_17637801_1[key#109440,ex_key#109442,filter#109444] Batched: true, DataFilters: [isnotnull(ex_key#109442), isnotnull(filter#109444), (ex_key#109442 = 3), (filter#109444 = 160), ..., Format: ORC, Location: InMemoryFileIndex(1 paths)[file:/home/ubuntu/spark-rapids/integration_tests/target/run_dir-202401..., PartitionFilters: [], PushedFilters: [IsNotNull(ex_key), IsNotNull(filter), EqualTo(ex_key,3), EqualTo(filter,160), IsNotNull(key)], ReadSchema: struct<key:int,ex_key:int,filter:int>
                 +- Project [key#109578, value#109581L]
                    +- BroadcastHashJoin [key#109578], [key#109582], Inner, BuildRight, false
                       :- HashAggregate(keys=[key#109578], functions=[finalmerge_sum(merge sum#109599L) AS sum(value#109577)#109538L], output=[key#109578, value#109581L])
                       :  +- Exchange hashpartitioning(key#109578, 4), ENSURE_REQUIREMENTS, [plan_id=190089]
                       :     +- HashAggregate(keys=[key#109578], functions=[partial_sum(value#109577) AS sum#109599L], output=[key#109578, sum#109599L])
                       :        +- Project [value#109577, key#109578]
                       :           +- Filter (isnotnull(value#109577) AND (value#109577 > 0))
                       :              +- FileScan orc spark_catalog.default.tmp_table_gw1_17637801_0[value#109577,key#109578,skey#109579] Batched: true, DataFilters: [isnotnull(value#109577), (value#109577 > 0)], Format: ORC, Location: InMemoryFileIndex(50 paths)[file:/home/ubuntu/spark-rapids/integration_tests/target/run_dir-20240..., PartitionFilters: [isnotnull(key#109578), dynamicpruning#109593 109592], PushedFilters: [IsNotNull(value), GreaterThan(value,0)], ReadSchema: struct<value:int>
                       +- Exchange SinglePartition, EXECUTOR_BROADCAST, [plan_id=190066]
                          +- Project [key#109582]
                             +- Filter ((((isnotnull(ex_key#109584) AND isnotnull(filter#109586)) AND (ex_key#109584 = 3)) AND (filter#109586 = 160)) AND isnotnull(key#109582))
                                +- FileScan orc spark_catalog.default.tmp_table_gw1_17637801_1[key#109582,ex_key#109584,filter#109586] Batched: true, DataFilters: [isnotnull(ex_key#109584), isnotnull(filter#109586), (ex_key#109584 = 3), (filter#109586 = 160), ..., Format: ORC, Location: InMemoryFileIndex(1 paths)[file:/home/ubuntu/spark-rapids/integration_tests/target/run_dir-202401..., PartitionFilters: [], PushedFilters: [IsNotNull(ex_key), IsNotNull(filter), EqualTo(ex_key,3), EqualTo(filter,160), IsNotNull(key)], ReadSchema: struct<key:int,ex_key:int,filter:int>

Steps/Code to reproduce bug

Expected behavior

Environment details (please complete the following information)

Additional context

The text was updated successfully, but these errors were encountered:

mattahrens · 2024-01-16T21:14:34Z

@NVnavkumar is this fixed by your PR? #10168

NVnavkumar · 2024-01-16T22:15:01Z

I was able to replicate this on Apache Spark 3.2.4, so it's not specific to databricks. Will post a PR soon.

andygrove added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 10, 2024

andygrove changed the title ~~[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failues in CI (databricks)~~ [BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) Jan 10, 2024

andygrove mentioned this issue Jan 10, 2024

Fix build regression against Spark 3.2.x [databricks] #10179

Merged

mattahrens assigned NVnavkumar Jan 16, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Jan 16, 2024

NVnavkumar mentioned this issue Jan 16, 2024

Further prevent degenerative joins in dpp_test [databricks] #10204

Merged

NVnavkumar closed this as completed in #10204 Jan 17, 2024

NvTimLiu mentioned this issue Mar 5, 2024

[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI Databricks 13.3 #10548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) #10182

[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) #10182

andygrove commented Jan 10, 2024

mattahrens commented Jan 16, 2024

NVnavkumar commented Jan 16, 2024

[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) #10182

[BUG] test_dpp_bypass / test_dpp_via_aggregate_subquery failures in CI (databricks) #10182

Comments

andygrove commented Jan 10, 2024

mattahrens commented Jan 16, 2024

NVnavkumar commented Jan 16, 2024