-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qualification tool support #2574
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Thomas Graves <tgraves@apache.org>
This reverts commit 6f5271c.
tgravescs
requested review from
GaryShen2008,
jlowe,
NvTimLiu and
revans2
as code owners
June 3, 2021 18:04
build |
tgravescs
commented
Jun 3, 2021
...ark-tools/src/test/scala/com/nvidia/spark/rapids/tool/profiling/QualificationInfoUtils.scala
Outdated
Show resolved
Hide resolved
build |
build |
17 tasks
andygrove
approved these changes
Jun 3, 2021
nartal1
approved these changes
Jun 3, 2021
tested timed out for some reason. |
looks like didn't get nodes for a long time: 14:00:01 [Warning][sw-gpu-spark/premerge-test-jenkins-rapids-premerge-github-1776-xpf3l-tw3jd][FailedScheduling] 0/428 nodes are available: 137 Insufficient memory, 182 Insufficient nvidia.com/gpu, 23 node(s) were unschedulable, 402 node(s) didn't match node selector, 93 Insufficient cpu. |
build |
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Qualification tool Signed-off-by: Thomas Graves <tgraves@apache.org> * remove unused func * Add missing files * Add checks for format option * cast columsn to string to write to text * Revert "Add checks for format option" This reverts commit 6f5271c. * cleanup Signed-off-by: Thomas Graves <tgraves@nvidia.com> * update output dir * formating * Update help messages * update app name * cleanup * put test functions back * fix typo
nartal1
pushed a commit
to nartal1/spark-rapids
that referenced
this pull request
Jun 9, 2021
* Qualification tool Signed-off-by: Thomas Graves <tgraves@apache.org> * remove unused func * Add missing files * Add checks for format option * cast columsn to string to write to text * Revert "Add checks for format option" This reverts commit 6f5271c. * cleanup Signed-off-by: Thomas Graves <tgraves@nvidia.com> * update output dir * formating * Update help messages * update app name * cleanup * put test functions back * fix typo
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this adds support for the Qualification tool that ranks applications based on if they are a good fit for the plugin. This currently ranks based on SQL dataframe time / application time. It reports potential problems (UDFs) that we find. It OPTIONALLY reports the percent executor CPU time. With a lot of apps adding the percent executor cpu time can take a very long time, so I made it off by default.
These latter things are just reported for the user as information and not used in the rankings.
I also changed the default format to csv. User can also output to text and I made both work with HDFS.
I split the qualification tool into its own Main function since these seem like distinct tools with different audiences, we can discuss if people have other opinions. if we make it one for qualification and profiling we need to come up with good generic name and then some options.
I tried to remove calls to things that aren't used for qualification so you will see options around that added. I also had to change a few tables so I didn't have to join across so many tables. The query with 100 tpcds apps becomes huge and the spark analyzer takes forever to run over it because we have so many tables.
This also contains various bug fixes to handled truncated files and missing data.
This has very minimal doc changes - those will come later.
Fixed a bug with dropping tables and then removed caching.
I added more tests and manually ran the over the tpcds logs.
Output of the tool looks like: