You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SPARK-36315 added explicit Partitioning case class checks for updating the output distribution for coalesced shuffle reads. Unfortunately since the RAPIDS Accelerator provides GPU versions of these Partitioning case classes, the code fails to recognize the partitioning and classifies it as UnknownPartitioning. This often causes validation of the shuffle-coalesced optimized plan to fail since it appears the shuffle does not provide the required partitioning.
This ends up manifesting as stages run on the GPU that do not have a reduced partition count due to AQE shuffle coalescing that the corresponding CPU run does.
The text was updated successfully, but these errors were encountered:
It seems we have two paths forward here. Either we need to find a way to use Spark's Partitioning case classes in GPU plans so the CPU can properly recognize the partitioning scheme being used or we will need to update Spark so it can better handle custom partitioning classes in AQEShuffleReadExec.
SPARK-36315 added explicit
Partitioning
case class checks for updating the output distribution for coalesced shuffle reads. Unfortunately since the RAPIDS Accelerator provides GPU versions of thesePartitioning
case classes, the code fails to recognize the partitioning and classifies it asUnknownPartitioning
. This often causes validation of the shuffle-coalesced optimized plan to fail since it appears the shuffle does not provide the required partitioning.This ends up manifesting as stages run on the GPU that do not have a reduced partition count due to AQE shuffle coalescing that the corresponding CPU run does.
The text was updated successfully, but these errors were encountered: