Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop reporting totalTime metric for GpuShuffleExchangeExec #973

Merged
merged 1 commit into from
Oct 19, 2020

Conversation

andygrove
Copy link
Contributor

Signed-off-by: Andy Grove andygrove@nvidia.com

Spark does not report a totalTime metric for shuffle exchanges and the metric we were reporting for the non AQE case was misleading/confusing, and we reported zero for the AQE case.

This PR removes the totalTime metric.

This closes #952

@andygrove andygrove self-assigned this Oct 16, 2020
@andygrove
Copy link
Contributor Author

build

@sameerz sameerz added the bug Something isn't working label Oct 16, 2020
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove andygrove force-pushed the remove-shuffle-total-time-metric branch from 6d0ce59 to 0530159 Compare October 19, 2020 19:38
@andygrove
Copy link
Contributor Author

build

@jlowe jlowe merged commit 7e1ae30 into NVIDIA:branch-0.3 Oct 19, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Oct 20, 2020
Signed-off-by: Andy Grove <andygrove@nvidia.com>
tgravescs added a commit that referenced this pull request Oct 21, 2020
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
Signed-off-by: Andy Grove <andygrove@nvidia.com>
sperlingxx pushed a commit to sperlingxx/spark-rapids that referenced this pull request Nov 20, 2020
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
@andygrove andygrove deleted the remove-shuffle-total-time-metric branch December 17, 2020 15:26
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove extra newline

* use the right -gt for bash

* Add new python file for databricks cluster utils

* Fix up scripts

* databricks scripts working

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* cluster creation script mods

* fix

* fix pub key

* fix missing quote

* fix $

* update public key to be param

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Add public key value

* clenaup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* modify permissions

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* change loc cluster id file

* fix extra /

* quote public key

* try different setting cluster id

* debug

* try again

* try readfile

* try again

* try quotes

* cleanup

* Add option to control number of partitions when converting from CSV to Parquet (NVIDIA#915)

* Add command-line arguments for applying coalesce and repartition on a per-table basis

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Move command-line validation logic and address other feedback

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update copyright years and fix import order

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update docs/benchmarks.md

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Remove withPartitioning option from TPC-H and TPC-xBB file conversion

Signed-off-by: Andy Grove <andygrove@nvidia.com>

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

* Benchmark runner script (NVIDIA#918)

* Benchmark runner script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add argument for number of iterations

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Fix docs

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* add license

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* improve documentation for the configuration files

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add missing line-continuation symbol in example

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Remove hard-coded spark-submit-template.txt and add --template argument. Also make all arguments required.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Update benchmarking guide to link to the benchmark python script

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add --template to example and fix markdown header

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add legacy config to clear active Spark 3.1.0 session in tests (NVIDIA#970)

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

* XFail tests until final fix can be put in (NVIDIA#968)

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

* Stop reporting totalTime metric for GpuShuffleExchangeExec (NVIDIA#973)

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* Add some more checks to databricks build scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* Pass in sshkey

* Add create script, add more parameters, etc

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* add create script

* rework some scripts

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* fix is_cluster_running

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* put slack back in

* update text

* cleanup

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

* remove datetime

* send output to stderr

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Co-authored-by: Robert (Bobby) Evans <bobby@apache.org>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#973)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]The metrics "total time“ of GpuColumnarExchange is strange
4 participants