Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an explain only mode to the plugin #4322

Merged
merged 33 commits into from
Jan 14, 2022

Conversation

tgravescs
Copy link
Collaborator

@tgravescs tgravescs commented Dec 7, 2021

This allows users to run on the CPU and have the plugin evaluate the plan as if it would have run on GPU and output explain output the driver log file. Note that this isn't perfect, specifically for AQE where the plan may change as its executed.

fixes #4238

In explain only mode we don't acquire GPU and don't enable spark rapids shuffle (fallback to Spark version if configured), but processes the plan through the GpuOverrides like it would if it were running on the GPU. In the end we return the CPU plan still so it runs on CPU and log the explain output.

This requires the cudf and rapids jar be present and plugin enabled with the mode set to explainOnly. spark.rapids.sql.mode=explainOnly and spark.plugins=com.nvidia.spark.SQLPlugin. The alternate mode I called executeOnGPU just thinking if we happen to add other modes that might be best. Happy to change it if people have better ideas.

This PR updates the logging on startup to explain the mode, previously it just always printed how to turn the plugin off even if it was disabled. Now it printed enabled, disabled or explain only mode and reference configs to change.

I tested hits on a bunch of NDS queries and compared output vs actually running on GPU. AQE isn't perfect but put some docs in there about it. Added one basic integration test, I don't really have a good way to test the output since just goes to the logger. Manually tested on Databricks.

We should make sure we have QA test for explain only mode as well.

@tgravescs tgravescs added the feature request New feature or request label Dec 7, 2021
@tgravescs tgravescs added this to the Nov 30 - Dec 10 milestone Dec 7, 2021
@tgravescs tgravescs self-assigned this Dec 7, 2021
@tgravescs
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick high-level comments, have not reviewed in detail. Also seems to be missing corresponding configs.md doc update from the RapidsConf change.

@tgravescs
Copy link
Collaborator Author

I need to update the config docs

@tgravescs
Copy link
Collaborator Author

I updated to have spark.rapids.sql.mode

@tgravescs
Copy link
Collaborator Author

I need to regenerate the configs docs, will update shortly.

@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

it seems our tests are relying on the behavior of device manager ot create the rapids buffer store (and initialize gpu and memory) even though sql plugin is disabled. Looking at better way to handle this.

want to initialize stuff on startup with the plugin disabled and
dynamically enable it afterwards
@tgravescs
Copy link
Collaborator Author

build


This allows running queries on the CPU and the plugin will evaluate the queries as if it was
going to run on the GPU and tell you what would and wouldn't have been run on the GPU.
There are two ways to run this, one is running with the plugin set to explain only mode and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Ideally should be using "RAPIDS Accelerator" rather than "plugin" in user docs. Applies to other places in the PR.

docs/get-started/getting-started-workload-qualification.md Outdated Show resolved Hide resolved
docs/get-started/getting-started-workload-qualification.md Outdated Show resolved Hide resolved
@tgravescs
Copy link
Collaborator Author

build

tgravescs and others added 3 commits January 14, 2022 09:47
…cala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…cala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
…cala

Co-authored-by: Jason Lowe <jlowe@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

docs/configs.md Outdated Show resolved Hide resolved
@tgravescs
Copy link
Collaborator Author

build

@tgravescs tgravescs merged commit 9a5eac3 into NVIDIA:branch-22.02 Jan 14, 2022
@tgravescs tgravescs deleted the explainonlymode branch January 14, 2022 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add a Spark 3.X Explain only mode to the plugin
3 participants