Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show partition metrics for custom shuffler reader #1060

Merged
merged 7 commits into from
Nov 20, 2020

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Nov 3, 2020

This PR adds metrics to GpuCustomShuffleReader so that we can see the number and size of the resulting partitions.

shuffle-reader-stats

@andygrove andygrove added the SQL part of the SQL/Dataframe plugin label Nov 3, 2020
@andygrove andygrove self-assigned this Nov 3, 2020
@andygrove andygrove changed the title [WIP] Show metrics for custom shuffler reader [WIP] Show # input/outpout partitions for custom shuffler reader Nov 3, 2020
@andygrove andygrove changed the title [WIP] Show # input/outpout partitions for custom shuffler reader [WIP] Show # input/output partitions for custom shuffler reader Nov 3, 2020
@andygrove andygrove changed the title [WIP] Show # input/output partitions for custom shuffler reader [WIP] Show partitions metrics for custom shuffler reader Nov 4, 2020
@andygrove andygrove changed the title [WIP] Show partitions metrics for custom shuffler reader [WIP] Show partition metrics for custom shuffler reader Nov 4, 2020
@andygrove andygrove changed the title [WIP] Show partition metrics for custom shuffler reader Show partition metrics for custom shuffler reader Nov 4, 2020
jlowe
jlowe previously approved these changes Nov 10, 2020
@jlowe
Copy link
Member

jlowe commented Nov 10, 2020

build

abellina
abellina previously approved these changes Nov 11, 2020
@sameerz
Copy link
Collaborator

sameerz commented Nov 15, 2020

build

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

failed due to timeout this time

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

orc write test failed this time

16:53:08  �[31m�[1m_________________________ test_buckets_write_fallback __________________________�[0m
16:53:08  [gw1] linux -- Python 3.6.12 /usr/bin/python
16:53:08  
16:53:08  spark_tmp_path = '/tmp/pyspark_tests//573108/'
16:53:08  spark_tmp_table_factory = <conftest.TmpTableFactory object at 0x7f3a20192240>
16:53:08  
16:53:08      @allow_non_gpu('DataWritingCommandExec')
16:53:08      def test_buckets_write_fallback(spark_tmp_path, spark_tmp_table_factory):
16:53:08          data_path = spark_tmp_path + '/ORC_DATA'
16:53:08          assert_gpu_fallback_write(
16:53:08                  lambda spark, path: spark.range(10e4).write.bucketBy(4, "id").sortBy("id").format('orc').mode('overwrite').option("path", path).saveAsTable(spark_tmp_table_factory.get()),
16:53:08                  lambda spark, path: spark.read.orc(path),
16:53:08                  data_path,
16:53:08  >               'DataWritingCommandExec')
16:53:08  
16:53:08  �[1m�[31m../../src/main/python/orc_write_test.py�[0m:119: 
16:53:08  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
16:53:08  �[1m�[31m../../src/main/python/asserts.py�[0m:275: in assert_gpu_fallback_write
16:53:08      assert_equal(from_cpu, from_gpu)
16:53:08  �[1m�[31m../../src/main/python/asserts.py�[0m:86: in assert_equal
16:53:08      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
16:53:08  �[1m�[31m../../src/main/python/asserts.py�[0m:38: in _assert_equal
16:53:08      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
16:53:08  �[1m�[31m../../src/main/python/asserts.py�[0m:31: in _assert_equal
16:53:08      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
16:53:08  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
16:53:08  
16:53:08  cpu = 6250, gpu = 75003

@jlowe
Copy link
Member

jlowe commented Nov 18, 2020

Need to upmerge this PR. That test was recently fixed.

Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: Andy Grove <andygrove@nvidia.com>
@andygrove
Copy link
Contributor Author

build

jlowe
jlowe previously approved these changes Nov 18, 2020
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protip: If you merge to the latest on branch-0.3 rather than rebasing and force-pushing, Github is smart enough to not lose the PR approval. See #1145 as an example.

@andygrove
Copy link
Contributor Author

Now failing to build against 3.1.0 due to changes to MapOutputTracker API. I may need to do some shim work. I will look at this tomorrow.

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

Failed due to test_broadcast_nested_loop_join_special_case which is now fixed in master. I will merge the fix here and rebuild.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit f66b41d into NVIDIA:branch-0.3 Nov 20, 2020
@andygrove andygrove deleted the shuffle-reader-metrics branch November 20, 2020 03:38
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add partition metrics to custom shuffle reader

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* revert change

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* refactor to combine metrics and reader in single match statement

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* use shim layer to get map output sizes

Signed-off-by: Andy Grove <andygrove@nvidia.com>
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
* Add partition metrics to custom shuffle reader

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* revert change

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* refactor to combine metrics and reader in single match statement

Signed-off-by: Andy Grove <andygrove@nvidia.com>

* use shim layer to get map output sizes

Signed-off-by: Andy Grove <andygrove@nvidia.com>
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#1060)

Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants