Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720

andygrove · 2020-09-10T15:39:02Z

Signed-off-by: Andy Grove andygrove@nvidia.com

GpuTransitionOverrides has special handling for adaptive queries, where GpuCoalesceBatches is removed from a GpuShuffleExchangeExec and re-inserted around the GpuCustomShuffleReader. See comments in this PR for a more detailed explanation.

The bug here though was that we assumed all queries where adaptive queries when AQE is enabled, and there are cases where queries are not adaptive (such as when dynamic partition pruning is used) and this resulted in the plugin removing the GpuCoalesceBatches operator and not inserting it again (because there was no GpuCustomShuffleReader) and this results in poor performance.

This closes #698

… AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2020-09-10T15:57:21Z

build

andygrove · 2020-09-10T18:10:50Z

build

andygrove · 2020-09-10T20:34:01Z

The scala tests failed this time against spark 3.0.1 and I cannot reproduce locally.

andygrove · 2020-09-10T20:37:29Z

build

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2020-09-10T21:23:06Z

build

jlowe · 2020-09-10T22:14:49Z

docs/configs.md

@@ -57,7 +57,7 @@ Name | Description | Default Value
 <a name="sql.format.parquet.enabled"></a>spark.rapids.sql.format.parquet.enabled|When set to false disables all parquet input and output acceleration|true
 <a name="sql.format.parquet.multiThreadedRead.enabled"></a>spark.rapids.sql.format.parquet.multiThreadedRead.enabled|When set to true, reads multiple small files within a partition more efficiently by reading each file in a separate thread in parallel on the CPU side before sending to the GPU. Limited by spark.rapids.sql.format.parquet.multiThreadedRead.numThreads and spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFileProcessed|true
 <a name="sql.format.parquet.multiThreadedRead.maxNumFilesParallel"></a>spark.rapids.sql.format.parquet.multiThreadedRead.maxNumFilesParallel|A limit on the maximum number of files per task processed in parallel on the CPU side before the file is sent to the GPU. This affects the amount of host memory used when reading the files in parallel.|2147483647
-<a name="sql.format.parquet.multiThreadedRead.numThreads"></a>spark.rapids.sql.format.parquet.multiThreadedRead.numThreads|The maximum number of threads, on the executor, to use for reading small parquet files in parallel.|20
+<a name="sql.format.parquet.multiThreadedRead.numThreads"></a>spark.rapids.sql.format.parquet.multiThreadedRead.numThreads|The maximum number of threads, on the executor, to use for reading small parquet files in parallel. This can not be changed at runtime after the executor hasstarted.|20


This indicates the branch needs to be upmerged/rebased on latest branch-0.2.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2020-09-11T02:44:53Z

build

andygrove · 2020-09-11T03:47:17Z

Status update: I have manually tested this by running TPC-DS benchmarks and confirming no regressions in performance (with our derived TPC-DS benchmarks, so no DPP involved).

…bled (NVIDIA#720) * Fix bug where GpuCoalesceBatches is removed from non-AQE queries when AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>

[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]

Fix bug where GpuCoalesceBatches is removed from non-AQE queries when…

38e5b12

… AQE is enabled Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove added the bug Something isn't working label Sep 10, 2020

andygrove added this to the Aug 31 - Sep 11 milestone Sep 10, 2020

andygrove self-assigned this Sep 10, 2020

andygrove changed the title ~~[WIP] Stop removing GpuCoalesceBatches from non-AQE queries~~ [WIP] Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled Sep 10, 2020

remove explain

6cd8859

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove added P0 Must have for release and removed P0 Must have for release labels Sep 10, 2020

andygrove changed the title ~~[WIP] Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled~~ Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled Sep 10, 2020

andygrove changed the title ~~Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled~~ [WIP] Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled Sep 10, 2020

fix regression

7a286c3

Signed-off-by: Andy Grove <andygrove@nvidia.com>

jlowe reviewed Sep 10, 2020

View reviewed changes

merge from branch-0.2

12b21d4

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove changed the title ~~[WIP] Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled~~ Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled Sep 11, 2020

jlowe approved these changes Sep 11, 2020

View reviewed changes

revans2 approved these changes Sep 11, 2020

View reviewed changes

revans2 merged commit b3c485c into NVIDIA:branch-0.2 Sep 11, 2020

andygrove deleted the aqe-missing-coalesce branch December 17, 2020 15:25

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Merge pull request NVIDIA#720 from NVIDIA/bot-auto-merge-branch-22.12

1dd5cea

[auto-merge] bot-auto-merge-branch-22.12 to branch-23.02 [skip ci] [bot]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720

Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720

andygrove commented Sep 10, 2020 •

edited

Loading

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

jlowe Sep 10, 2020

andygrove commented Sep 11, 2020

andygrove commented Sep 11, 2020

Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720

Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled #720

Conversation

andygrove commented Sep 10, 2020 • edited Loading

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

andygrove commented Sep 10, 2020

jlowe Sep 10, 2020

Choose a reason for hiding this comment

andygrove commented Sep 11, 2020

andygrove commented Sep 11, 2020

andygrove commented Sep 10, 2020 •

edited

Loading