Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix canonicalization of GpuFileSourceScanExec, GpuShuffleCoalesceExec #1310

Merged
merged 1 commit into from
Dec 8, 2020

Conversation

jlowe
Copy link
Member

@jlowe jlowe commented Dec 8, 2020

At least part of the issue in #1308 is caused by parts of the query being executed multiple times redundantly because Spark is unable to determine that subsections of the query are identical. This is caused by broken canonicalization where a GPU Exec node is not comparing correctly with a semantically-equivalent copy of itself.

In this case it's the use of a RapidsConf instance as a parameter of the case class. RapidsConf doesn't have an equals method, so two instances will be considered different. However I don't think adding an equals method is the real fix here, as the configs could be slightly different in ways unrelated to the semantics of the node's execution and therefore should still be considered equal with respect to that node's canonicalization.

In the case of GpuShuffleCoalesceExec it was easy, there was only one config value being used, so I replaced the conf parameter with the value from the conf. For GpuFileSourceScanExec it is trickier since there are many config parameters being used, and it already has quite a few arguments. In this case I opted to place the conf instance in a separate parameter list which is excluded from the automatically generated equals and hashcode comparison which is used by Spark plan canonicalization. In our case, there are no RAPIDS config settings that change the semantics for what data is read and how it is presented, so we can safely ignore any RAPIDS configs for purposes of canonicalization.

@jlowe jlowe added the SQL part of the SQL/Dataframe plugin label Dec 8, 2020
@jlowe jlowe added this to the Dec 7 - Dec 18 milestone Dec 8, 2020
@jlowe jlowe self-assigned this Dec 8, 2020
@jlowe
Copy link
Member Author

jlowe commented Dec 8, 2020

build

@revans2 revans2 merged commit ccbb3e1 into NVIDIA:branch-0.3 Dec 8, 2020
@jlowe
Copy link
Member Author

jlowe commented Dec 8, 2020

This has a bug where an extra argument was added to GpuFileSourceScanExec but otherCopyArgs was not overridden. It can fail queries where the node needs to be copied after it's inserted into the plan. I'll post a followup PR.

@jlowe jlowe deleted the fix-canonicalization branch December 8, 2020 15:56
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
[auto-merge] bot-auto-merge-branch-23.08 to branch-23.10 [skip ci] [bot]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants