Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Path key/subscript to save GPU memory usage to improve perf #1987

Merged
merged 3 commits into from
Apr 29, 2024

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Apr 22, 2024

Remove Path key/subscript to save GPU memory usage to improve perf.

I found less GPU stack memory usage will help on perf.
I decreased the max path depth from 16 to 8, and get perf improvement.

constexpr int max_path_depth = 8;

And I optimized the main flow logic to reduced query paths:

  • key and named to one path => named
  • subscript and index/wildcard to one path => index/wildcard
    From the definition of JSON query path, key and named always ocurr in the same time, and also subscript and index/wildcard are always occur in the same time. So it's safe to reduce (key, named) to (named), and it's safe to reduce (subscript, index/wildcard) to (index/wildcard)

Note: If query path is a single wildcard , e.g.: $.* , Spark returns null, so the path is equivalent to invalid path.

performance test

Customer Base line(1st, 2nd, 3rd) cudf legacy After this PR speed
customer 1 231500 ms, 232760 ms, 232647 ms 275719 ms, 272898 ms, 274407 ms about 84%

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

res-life commented Apr 22, 2024

Some IT cases are not passed:

./integration_tests/run_pyspark_from_build.sh -s -k test_get_json_object
========= 9 failed, 58 passed, 27086 deselected, 9 warnings in 18.72s ==========

Chong Gao added 2 commits April 22, 2024 21:22
@res-life
Copy link
Collaborator Author

All IT cases passed after applying two PRs:

@res-life res-life changed the title Remove Path key/subscribe to save GPU memory usage to improve perf [WIP] Remove Path key/subscribe to save GPU memory usage to improve perf Apr 23, 2024
@@ -696,7 +400,7 @@ __device__ bool evaluate_path(json_parser& p,
// There is a same constant in JSONUtil.java, keep them consistent when changing
// Note: Spark-Rapids should guarantee the path depth is less or equal to this limit,
// or GPU reports cudaErrorIllegalAddress
constexpr int max_path_depth = 16;
constexpr int max_path_depth = 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please split up these two changes so that we can evaluate the performance of each of them separately? I would really like to see max_path_depth be turned into a template parameter for the kernel. That way we can have a few different kernels and select the right one depending on the path done. That or we can do like @nvdbaranec did on the original version and set the amount of shared memory that would be needed as part of launching the kernel. But that can be follow on work.

Copy link
Collaborator Author

@res-life res-life Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please split up these two changes so that we can evaluate the performance of each of them separately?

Now, we combine [Key, Named] 2 paths (e.g.: .name1) to a single [Named] path, so the change from 16 to 8 is not a tuning.
And:
[Subscribe, Index] e.g.: [1] => [Index]
[Subscribe, Wildcard] e.g.: [*] => [Wildcard]
No need to separate to two PRs.

That or we can do like @nvdbaranec did on the original version and set the amount of shared memory that would be needed as part of launching the kernel. But that can be follow on work.

OK, will add in a follow-up.

@res-life res-life changed the title [WIP] Remove Path key/subscribe to save GPU memory usage to improve perf [WIP] Remove Path key/subscript to save GPU memory usage to improve perf Apr 26, 2024
@res-life
Copy link
Collaborator Author

build

@res-life res-life changed the title [WIP] Remove Path key/subscript to save GPU memory usage to improve perf Remove Path key/subscript to save GPU memory usage to improve perf Apr 28, 2024
@res-life res-life marked this pull request as ready for review April 29, 2024 05:53
@res-life res-life merged commit 2ca29f7 into NVIDIA:branch-24.06 Apr 29, 2024
3 checks passed
@res-life res-life deleted the get-json-object-perf3 branch April 29, 2024 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants