Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

Closed
jlowe opened this issue Sep 30, 2021 · 1 comment · Fixed by #3719
Closed

[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2 #3713

jlowe opened this issue Sep 30, 2021 · 1 comment · Fixed by #3719
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@jlowe
Copy link
Member

jlowe commented Sep 30, 2021

SPARK-36315 added explicit Partitioning case class checks for updating the output distribution for coalesced shuffle reads. Unfortunately since the RAPIDS Accelerator provides GPU versions of these Partitioning case classes, the code fails to recognize the partitioning and classifies it as UnknownPartitioning. This often causes validation of the shuffle-coalesced optimized plan to fail since it appears the shuffle does not provide the required partitioning.

This ends up manifesting as stages run on the GPU that do not have a reduced partition count due to AQE shuffle coalescing that the corresponding CPU run does.

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 30, 2021
@jlowe
Copy link
Member Author

jlowe commented Sep 30, 2021

It seems we have two paths forward here. Either we need to find a way to use Spark's Partitioning case classes in GPU plans so the CPU can properly recognize the partitioning scheme being used or we will need to update Spark so it can better handle custom partitioning classes in AQEShuffleReadExec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant