Add Qualification tool support #2574

tgravescs · 2021-06-03T18:04:40Z

this adds support for the Qualification tool that ranks applications based on if they are a good fit for the plugin. This currently ranks based on SQL dataframe time / application time. It reports potential problems (UDFs) that we find. It OPTIONALLY reports the percent executor CPU time. With a lot of apps adding the percent executor cpu time can take a very long time, so I made it off by default.
These latter things are just reported for the user as information and not used in the rankings.

I also changed the default format to csv. User can also output to text and I made both work with HDFS.

I split the qualification tool into its own Main function since these seem like distinct tools with different audiences, we can discuss if people have other opinions. if we make it one for qualification and profiling we need to come up with good generic name and then some options.

I tried to remove calls to things that aren't used for qualification so you will see options around that added. I also had to change a few tables so I didn't have to join across so many tables. The query with 100 tpcds apps becomes huge and the spark analyzer takes forever to run over it because we have so many tables.

This also contains various bug fixes to handled truncated files and missing data.

This has very minimal doc changes - those will come later.

Fixed a bug with dropping tables and then removed caching.

I added more tests and manually ran the over the tpcds logs.

Output of the tool looks like:

### Qualification ###
+--------------------------------------+-------------------+-----+------------------+----------------------+------------+-------------------------+
|App Name                              |App ID             |Rank |Potential Problems|SQL Dataframe Duration|App Duration|Executor CPU Time Percent|
+--------------------------------------+-------------------+-----+------------------+----------------------+------------+-------------------------+
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|68.19|                  |11128                 |16319       |70.91                    |
|Rapids Spark Profiling Tool Unit Tests|local-1621969619749|14.34|UDF               |1560                  |10880       |43.79                    |
|Rapids Spark Profiling Tool Unit Tests|local-1621966649543|0.0  |                  |0                     |10650       |26.65                    |
|Rapids Spark Profiling Tool Unit Tests|local-1621955976602|0.0  |                  |0                     |10419       |25.8                     |
+--------------------------------------+-------------------+-----+------------------+----------------------+------------+-------------------------+

Signed-off-by: Thomas Graves <tgraves@apache.org>

This reverts commit 6f5271c.

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2021-06-03T18:05:25Z

build

...ark-tools/src/test/scala/com/nvidia/spark/rapids/tool/profiling/QualificationInfoUtils.scala

tgravescs · 2021-06-03T18:11:34Z

build

tgravescs · 2021-06-03T18:29:37Z

build

tgravescs · 2021-06-03T21:43:56Z

tested timed out for some reason.

tgravescs · 2021-06-03T21:44:34Z

looks like didn't get nodes for a long time: 14:00:01 [Warning][sw-gpu-spark/premerge-test-jenkins-rapids-premerge-github-1776-xpf3l-tw3jd][FailedScheduling] 0/428 nodes are available: 137 Insufficient memory, 182 Insufficient nvidia.com/gpu, 23 node(s) were unschedulable, 402 node(s) didn't match node selector, 93 Insufficient cpu.

tgravescs · 2021-06-03T21:44:40Z

build

* Qualification tool Signed-off-by: Thomas Graves <tgraves@apache.org> * remove unused func * Add missing files * Add checks for format option * cast columsn to string to write to text * Revert "Add checks for format option" This reverts commit 6f5271c. * cleanup Signed-off-by: Thomas Graves <tgraves@nvidia.com> * update output dir * formating * Update help messages * update app name * cleanup * put test functions back * fix typo

tgravescs and others added 12 commits June 3, 2021 10:56

Qualification tool

9ffe53e

Signed-off-by: Thomas Graves <tgraves@apache.org>

remove unused func

56c8cc2

Add missing files

ef5cf4c

Add checks for format option

6f5271c

cast columsn to string to write to text

81dc6a4

Revert "Add checks for format option"

ee8c1dd

This reverts commit 6f5271c.

cleanup

bc0acd1

Signed-off-by: Thomas Graves <tgraves@nvidia.com>

update output dir

e9dc1ce

formating

aa298b2

Update help messages

f3f35f4

update app name

10f6678

cleanup

9e0d104

tgravescs added the feature request New feature or request label Jun 3, 2021

tgravescs added this to the May 24 - Jun 4 milestone Jun 3, 2021

tgravescs self-assigned this Jun 3, 2021

tgravescs requested review from GaryShen2008, jlowe, NvTimLiu and revans2 as code owners June 3, 2021 18:04

tgravescs commented Jun 3, 2021

View reviewed changes

...ark-tools/src/test/scala/com/nvidia/spark/rapids/tool/profiling/QualificationInfoUtils.scala Outdated Show resolved Hide resolved

put test functions back

50b4c29

fix typo

f2162f1

tgravescs mentioned this pull request Jun 3, 2021

[FEA] Profiling and qualification tool #2483

Closed

17 tasks

nartal1 mentioned this pull request Jun 3, 2021

Add filter support for qualification and profiling tool. #2576

Merged

andygrove approved these changes Jun 3, 2021

View reviewed changes

nartal1 approved these changes Jun 3, 2021

View reviewed changes

tgravescs merged commit 1129641 into NVIDIA:branch-21.06 Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qualification tool support #2574

Add Qualification tool support #2574

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

Add Qualification tool support #2574

Add Qualification tool support #2574

Conversation

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021

tgravescs commented Jun 3, 2021