[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

jlowe · 2021-09-30T00:28:48Z

SPARK-36315 added explicit Partitioning case class checks for updating the output distribution for coalesced shuffle reads. Unfortunately since the RAPIDS Accelerator provides GPU versions of these Partitioning case classes, the code fails to recognize the partitioning and classifies it as UnknownPartitioning. This often causes validation of the shuffle-coalesced optimized plan to fail since it appears the shuffle does not provide the required partitioning.

This ends up manifesting as stages run on the GPU that do not have a reduced partition count due to AQE shuffle coalescing that the corresponding CPU run does.

The text was updated successfully, but these errors were encountered:

jlowe · 2021-09-30T00:30:32Z

It seems we have two paths forward here. Either we need to find a way to use Spark's Partitioning case classes in GPU plans so the CPU can properly recognize the partitioning scheme being used or we will need to update Spark so it can better handle custom partitioning classes in AQEShuffleReadExec.

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 30, 2021

jlowe mentioned this issue Sep 30, 2021

Fix issues with AQE and DPP enabled on Spark 3.2 [databricks] #3691

Merged

jlowe self-assigned this Sep 30, 2021

jlowe added P0 Must have for release and removed ? - Needs Triage Need team to review and classify labels Sep 30, 2021

jlowe mentioned this issue Sep 30, 2021

Advertise CPU sort order and partitioning expressions to Catalyst [databricks] #3719

Merged

jlowe closed this as completed in #3719 Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

jlowe commented Sep 30, 2021

jlowe commented Sep 30, 2021

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

Comments

jlowe commented Sep 30, 2021

jlowe commented Sep 30, 2021