-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720
Conversation
… AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
build |
The scala tests failed this time against spark 3.0.1 and I cannot reproduce locally. |
build |
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
docs/configs.md
Outdated
@@ -57,7 +57,7 @@ Name | Description | Default Value | |||
<a name="sql.format.parquet.enabled"></a>spark.rapids.sql.format.parquet.enabled|When set to false disables all parquet input and output acceleration|true | |||
<a name="sql.format.parquet.multiThreadedRead.enabled"></a>spark.rapids.sql.format.parquet.multiThreadedRead.enabled|When set to true, reads multiple small files within a partition more efficiently by reading each file in a separate thread in parallel on the CPU side before sending to the GPU. Limited by spark.rapids.sql.format.parquet.multiThreadedRead.numThreads and spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFileProcessed|true | |||
<a name="sql.format.parquet.multiThreadedRead.maxNumFilesParallel"></a>spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel|A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel.|2147483647 | |||
<a name="sql.format.parquet.multiThreadedRead.numThreads"></a>spark.rapids.sql.format.parquet.multiThreadedRead.numThreads|The maximum number of threads, on the executor, to use for reading small parquet files in parallel.|20 | |||
<a name="sql.format.parquet.multiThreadedRead.numThreads"></a>spark.rapids.sql.format.parquet.multiThreadedRead.numThreads|The maximum number of threads, on the executor, to use for reading small parquet files in parallel. This can not be changed at runtime after the executor hasstarted.|20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indicates the branch needs to be upmerged/rebased on latest branch-0.2.
Signed-off-by: Andy Grove <andygrove@nvidia.com>
build |
Status update: I have manually tested this by running TPC-DS benchmarks and confirming no regressions in performance (with our derived TPC-DS benchmarks, so no DPP involved). |
…bled (NVIDIA#720) * Fix bug where GpuCoalesceBatches is removed from non-AQE queries when AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>
…bled (NVIDIA#720) * Fix bug where GpuCoalesceBatches is removed from non-AQE queries when AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>
[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]
Signed-off-by: Andy Grove andygrove@nvidia.com
GpuTransitionOverrides
has special handling for adaptive queries, whereGpuCoalesceBatches
is removed from aGpuShuffleExchangeExec
and re-inserted around theGpuCustomShuffleReader
. See comments in this PR for a more detailed explanation.The bug here though was that we assumed all queries where adaptive queries when AQE is enabled, and there are cases where queries are not adaptive (such as when dynamic partition pruning is used) and this resulted in the plugin removing the
GpuCoalesceBatches
operator and not inserting it again (because there was noGpuCustomShuffleReader
) and this results in poor performance.This closes #698