Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve warnings about AQE nodes not supported on GPU #647

Merged
merged 2 commits into from
Sep 3, 2020

Conversation

andygrove
Copy link
Contributor

Signed-off-by: Andy Grove andygrove@nvidia.com

This PR improves UX slightly by adding specific handling in GpuOverrides for AQE operators so that we can provide better information to the user, explaining why these nodes are not swapped out for GPU versions.

For example, before this change, the user might see:

!NOT_FOUND <ShuffleQueryStageExec> cannot run on GPU because no GPU enabled version of operator class org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec could be found

With this change, the user would see:

!Exec <ShuffleQueryStageExec> cannot run on GPU because this query stage already started executing

This is still slightly misleading, because the query stage may already be running on GPU, but at least looks less like an internal error or unsupported feature now.

This closes #607

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove added this to the Aug 31 - Sep 11 milestone Sep 2, 2020
@andygrove
Copy link
Contributor Author

build

@sameerz sameerz added the feature request New feature or request label Sep 2, 2020
exec[AdaptiveSparkPlanExec]("Adaptive query", (exec, conf, p, r) =>
new SparkPlanMeta[AdaptiveSparkPlanExec](exec, conf, p, r) {
override def tagPlanForGpu(): Unit =
willNotWorkOnGpu("this is an adaptive plan")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be a bit more descriptive here? I can see users wondering why it says, this is an adaptive plan yet the docs say the plugin support AQE. "If it supports AQE, why isn't it handling adaptive plans? Isn't that the whole point of the plugin supporting AQE?" 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. This whole approach is still confusing. What I really wanted to do was not have any messages related to the AQE operators. Currently, we either replace an operator or show a warning. I think we need a third option of "this does not need replacing". I'm going to look at this again today and see if this can be implemented without major surgery on GpuOverrides.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree. There are other places we want this too. Perhaps it is just a list of class names that we should not warn about at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an update to introduce a new DoNotReplaceSparkPlanMeta rule and logic to suppress warnings about operators that use this rule. I manually tested with TPC-DS with AQE and it looks a lot cleaner now.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

@revans2 revans2 merged commit f659eaf into NVIDIA:branch-0.2 Sep 3, 2020
@andygrove andygrove deleted the better-aqe-errors branch December 17, 2020 15:25
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Improve warnings about AQE nodes not supported on GPU

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Introduce new DoNotReplaceSparkPlanMeta rule

Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Improve warnings about AQE nodes not supported on GPU

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Introduce new DoNotReplaceSparkPlanMeta rule

Signed-off-by: Andy Grove <andygrove@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…#647)

Signed-off-by: Peixin Li <pxli@nyu.edu>

Signed-off-by: Peixin Li <pxli@nyu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fix misleading "cannot run on GPU" warnings when AQE is enabled
4 participants