Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get json object comments address #1924

Merged

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented Apr 3, 2024

Closes #1906
Fixed #2021

This PR addresses some follow up comments for getJsonObject new kernel. After addressing it shows about 12% speedup in microbenchmark, but not seen speedup in e2e tests, almost all performance improvement comes from Move 2 variablesstring_token_utf8_bytes, bytes_diff_for_e…

Also fixed a bug:

Spark behaviors:

    // use unescaping logic
    get_json_object("['\\u4e2d\\u56FD']", "$")    : ["中国"]

    // use escaping logic
    get_json_object("'\\u4e2d\\u56FD'", "$")      : 中国

For the above 2 cases, both need to update the utf8_bytes for both ESCAPE and UNESCAPE mode.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

Will apply performance improvement changes from #1930 after it merged in 24.04

@YanxuanLiu
Copy link
Collaborator

build

@thirtiseven thirtiseven self-assigned this Apr 24, 2024
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
…scape_writing.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@res-life
Copy link
Collaborator

This PR also fixes #1963?
Not sure if need to split this PR: one is only for refactor/comments; one is for fixing #1963

curr_token = json_token::ERROR;
}
// previous token is not INIT, means already get a token; stack is
// empty; Successfully parsed. Note: ignore the tailing sub-string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment like:

/**
 * Allow tail useless sub-string in JSON, e.g.:
 * The following invalid JSON is allowed: {'k' : 'v'}_extra_tail_sub_string
 */

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -567,6 +508,116 @@ class json_parser {
}
}

__device__ inline std::pair<int, int> try_parse_quoted_string_size(char const* str_pos,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments for this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to: writeString?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@thirtiseven
Copy link
Collaborator Author

This PR also fixes #1963? Not sure if need to split this PR: one is only for refactor/comments; one is for fixing #1963

Ok will split it out

// Records string/field name token utf8 bytes size after unescaped
// e.g.: For JSON 4 chars string "\\n", after unescaped, get 1 char '\n'
// used by checking the max string length
int string_token_utf8_bytes = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to: unescped_string_utf8_bytes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// e.g.: 4 chars string "\\n", string_token_utf8_bytes is 1,
// when `write_escaped_text`, will write out 4 chars: " \ n ",
// then this diff will be 4 - 1 = 3
int bytes_diff_for_escape_writing = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to: escped_string_utf8_bytes?
Refactor to record actual bytes instead of diff? will make code more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -660,8 +377,7 @@ __device__ inline thrust::tuple<bool, int> path_match_subscript_index_subscript_
__device__ bool evaluate_path(json_parser& p,
json_generator& root_g,
write_style root_style,
path_instruction const* root_path_ptr,
int root_path_size)
cudf::device_span<path_instruction const> root_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure if using device_span will cause perf regression, because the size in device_span is 64 bits integer which is using more memory compared to 32 bits size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no perf regression seen from microbenchmark

{
auto tid = cudf::detail::grid_1d::global_thread_id();
auto const stride = cudf::detail::grid_1d::grid_stride();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to sync.
__ballot_sync is used to sync to generate a 32 bits mask within 32 threads(in a warp).
This get_json_object_size_kernel does not touch any mask, so need to do sync.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

d_sizes[tid] = 0;
}

tid += stride;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need, refer to previous comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1179,7 +921,7 @@ __device__ thrust::pair<bool, size_t> get_json_object_single(
json_generator generator((out_buf == nullptr || out_buf_size == 0) ? nullptr : out_buf);

bool const success = evaluate_path(
j_parser, generator, write_style::raw_style, path_commands_ptr, path_commands_size);
j_parser, generator, write_style::RAW, {path_commands.data(), path_commands.size()});

if (nullptr == out_buf && !success) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove nullptr == out_buf?
On first phase the out_buf is null, on second phase it's not null.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -624,14 +675,20 @@ class json_parser {
char const* to_match_str_pos,
char const* const to_match_str_end,
char* copy_destination,
write_style w_style)
escape_style w_style)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract a match_unescaped_string from this function? Since try_parse_quoted_string is doing 2 things.
match_unescaped_string matches a valid string and is used by match_current_field_name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

thirtiseven and others added 2 commits April 26, 2024 17:27
Co-authored-by: Chong Gao <gaochong.gc@qq.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

micro benchmark for current code:

before:

size_bytes max_depth Samples CPU Time Noise GPU Time Noise
1000000 2 2871x 5.206 ms 4.40% 5.203 ms 4.40%
10000000 2 87x 5.766 ms 0.43% 5.762 ms 0.42%
100000000 2 11x 116.033 ms 0.13% 116.030 ms 0.13%
1000000000 2 11x 1.358 s 0.05% 1.358 s 0.05%
1000000 4 1872x 5.279 ms 0.72% 5.275 ms 0.72%
10000000 4 86x 5.877 ms 0.44% 5.873 ms 0.43%
100000000 4 11x 117.015 ms 0.35% 117.012 ms 0.35%
1000000000 4 11x 1.359 s 0.07% 1.359 s 0.07%
1000000 6 1808x 5.294 ms 0.75% 5.290 ms 0.74%
10000000 6 100x 5.941 ms 0.50% 5.937 ms 0.50%
100000000 6 11x 117.163 ms 0.17% 117.160 ms 0.17%
1000000000 6 11x 1.361 s 0.05% 1.361 s 0.05%
1000000 8 1840x 5.382 ms 0.74% 5.379 ms 0.73%
10000000 8 83x 6.028 ms 0.41% 6.024 ms 0.41%
100000000 8 11x 117.987 ms 0.36% 117.983 ms 0.36%
1000000000 8 11x 1.363 s 0.07% 1.363 s 0.07%

after:

size_bytes max_depth Samples CPU Time Noise GPU Time Noise
1000000 2 2942x 5.080 ms 3.52% 5.076 ms 3.52%
10000000 2 91x 5.556 ms 0.40% 5.552 ms 0.39%
100000000 2 30x 102.939 ms 0.50% 102.936 ms 0.50%
1000000000 2 11x 1.197 s 0.04% 1.197 s 0.04%
1000000 4 1936x 5.154 ms 0.67% 5.151 ms 0.67%
10000000 4 88x 5.688 ms 0.36% 5.684 ms 0.36%
100000000 4 11x 103.090 ms 0.37% 103.086 ms 0.37%
1000000000 4 11x 1.199 s 0.04% 1.199 s 0.04%
1000000 6 1680x 5.171 ms 0.70% 5.168 ms 0.70%
10000000 6 87x 5.760 ms 0.49% 5.757 ms 0.48%
100000000 6 146x 103.382 ms 1.33% 103.379 ms 1.33%
1000000000 6 11x 1.199 s 0.04% 1.199 s 0.04%
1000000 8 1648x 5.259 ms 0.75% 5.256 ms 0.75%
10000000 8 86x 5.856 ms 0.48% 5.852 ms 0.47%
100000000 8 11x 104.066 ms 0.32% 104.063 ms 0.32%
1000000000 8 11x 1.200 s 0.04% 1.200 s 0.04%

It's about 12% speedup, will run some e2e test and do a per commit break down next.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven thirtiseven marked this pull request as ready for review April 30, 2024 03:01
@res-life
Copy link
Collaborator

BTW, please remove #include <cudf/json/json.hpp> in get_json_object.cu

{
auto match = path_match_element(path_ptr, path_size, path_instruction_type::INDEX);
auto match =
path_match_elements(path, path_instruction_type::SUBSCRIPT, path_instruction_type::INDEX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path_instruction_type::SUBSCRIPT is removed after this PR: #1987

@@ -1042,4 +1097,4 @@ std::unique_ptr<cudf::column> get_json_object(
return detail::get_json_object(input, instructions, stream, mr);
}

} // namespace spark_rapids_jni
} // namespace spark_rapids_jni
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line in the file end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if (copy_destination != nullptr) copy_destination += escape_chars;
escped_string_utf8_bytes += (escape_chars - 1);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        // check match if enabled
        const char* match_str_pos = nullptr;
        if (!try_match_char(match_str_pos, nullptr, *str_pos)) {
          return std::make_pair(unescped_string_utf8_bytes, escped_string_utf8_bytes);
        }

Please check this block. try_match_char(nullptr, nullptr, *str_pos) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to solve cannot bind non-const lvalue reference of type 'const char*&' to a value of type 'std::nullptr_t'

@@ -998,7 +985,8 @@ class json_parser {
__device__ bool try_skip_unicode(char const*& str_pos,
char const*& to_match_str_pos,
char const* const to_match_str_end,
char*& copy_dest)
char*& copy_dest,
int& unescped_string_utf8_bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pass in escped_string_utf8_bytes and update.
I found an existing bug which is not introduced in this PR.
Spark behaviors:

    // use unescaping logic
    get_json_object("['\\u4e2d\\u56FD']", "$")    : ["中国"]

    // use escaping logic
    get_json_object("'\\u4e2d\\u56FD'", "$")      : 中国

For the above 2 cases, both need to update the utf8_bytes for both ESCAPE and UNESCAPE mode.
Please update test case getJsonObjectTest_Escape

    String JSON6 = "['\\u4e2d\\u56FD\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\b']";
    String expectedStr6 = "中国\\\"'\\\\/\\b\\f\\n\\r\\t\\b";

And please update the PR description to describe this bug.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! done.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Apr 30, 2024
@revans2
Copy link
Collaborator

revans2 commented Apr 30, 2024

build

@revans2
Copy link
Collaborator

revans2 commented Apr 30, 2024

@thirtiseven can you tell me how you built and ran the benchmarks? I get compile errors when I try to turn it on using the docker container.

./build/build-in-docker -DGPU_ARCHS="86" -DCPP_PARALLEL_LEVEL=32 -Dlibcudf.build.configure=false -Dlibcudf.clean.skip=true -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=ON clean install -Dsubmodule.check.skip=true -Dtest='GetJsonObjectTest,!CuFileTest,!CudaFatalTest,!ColumnViewNonEmptyNullsTest'
...
[INFO]      [exec] CMake Error at benchmarks/CMakeLists.txt:58 (target_link_libraries):
[INFO]      [exec]   Target "ROW_CONVERSION_BENCH" links to:
[INFO]      [exec] 
[INFO]      [exec]     cudf::cudftestutil
[INFO]      [exec] 
[INFO]      [exec]   but the target was not found.  Possible reasons include:
[INFO]      [exec] 
[INFO]      [exec]     * There is a typo in the target name.
[INFO]      [exec]     * A find_package call is missing for an IMPORTED target.
[INFO]      [exec]     * An ALIAS target is missing.
[INFO]      [exec] 

@thirtiseven
Copy link
Collaborator Author

thirtiseven commented Apr 30, 2024

@revans2 I think set -DBUILD_TESTS=ON in your command will make it work but I don't know why.

My commands:
build

./build/build-in-docker install -Dsubmodule.check.skip=true -DCPP_PARALLEL_LEVEL=15 -DBUILD_TESTS -DUSE_GDS=\!ON -DBUILD_BENCHMARKS=ON -DskipTests

run benchmark

target/cmake-build/benchmarks/GET_JSON_OBJECT_BENCH --json latest.json

@ttnghia
Copy link
Collaborator

ttnghia commented Apr 30, 2024

Because cudftestutil is used in benchmark code but it is compiled in cudf test so you have to enable the test module.

@ttnghia
Copy link
Collaborator

ttnghia commented Apr 30, 2024

@thirtiseven When posting benchmark number, please use the output of the nvbench compare script so we can see the diff easier:

python nvbench_compare.py before.json after.json

@revans2
Copy link
Collaborator

revans2 commented May 1, 2024

@ttnghia have you had a chance to review this?

@ttnghia
Copy link
Collaborator

ttnghia commented May 1, 2024

@ttnghia have you had a chance to review this?

I'll try to review it soon today.

@ttnghia
Copy link
Collaborator

ttnghia commented May 3, 2024

@ttnghia have you had a chance to review this?

I'll try to review it soon today.

Sorry I was sick in the last several days. Now reviewing...

Comment on lines 837 to 838
char const* input,
cudf::size_type input_len,
Copy link
Collaborator

@ttnghia ttnghia May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about device_span<char const>? json_parser constructor can also be modified to take in device_span.

Copy link
Collaborator

@ttnghia ttnghia May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, why do we have this overload of get_json_object_size_single? Just one version (line 874) is not enough? I don't see any reason why we need to have this new overload.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it won't gain performance, reverted.

Comment on lines 921 to 923
template <int block_size>
__launch_bounds__(block_size) CUDF_KERNEL
void get_json_object_size_kernel(cudf::column_device_view col,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other overload, it is not necessary to have this kernel-I don't think we will gain performance by doing this. We can just use the existing kernel like before, but rewrite it a bit to minimize overhead when computing the output size. A few if instruction overhead in the kernel should not cause any performance difference since the kernel is limited by memory bandwidth, not compute capability

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear the difference is about spilling and the number of registers. This is what I found when working on #2015

By splitting them it reduced the number of registers for the one that does not write out data. You are correct that it has essentially no impact to the performance.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
@thirtiseven
Copy link
Collaborator Author

Compare results:

size_bytes max_depth Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
1000000 2 5.180 ms 4.07% 5.043 ms 3.79% -137.399 us -2.65% PASS
10000000 2 5.713 ms 5.41% 5.491 ms 0.28% -221.066 us -3.87% FAIL
100000000 2 117.399 ms 0.12% 101.573 ms 0.28% -15825.535 us -13.48% FAIL
1000000000 2 1.361 s 0.02% 1.179 s 0.03% -182007.568 us -13.37% FAIL
1000000 4 5.239 ms 0.52% 5.070 ms 0.50% -168.354 us -3.21% FAIL
10000000 4 5.799 ms 0.50% 5.560 ms 0.14% -239.384 us -4.13% FAIL
100000000 4 117.154 ms 0.23% 101.380 ms 0.17% -15773.619 us -13.46% FAIL
1000000000 4 1.359 s 0.04% 1.177 s 0.05% -182614.813 us -13.44% FAIL
1000000 6 5.265 ms 0.65% 5.125 ms 0.50% -139.882 us -2.66% FAIL
10000000 6 5.855 ms 0.42% 5.649 ms 0.50% -206.381 us -3.52% FAIL
100000000 6 117.486 ms 0.11% 101.598 ms 0.45% -15888.577 us -13.52% FAIL
1000000000 6 1.359 s 0.04% 1.176 s 0.03% -183312.345 us -13.49% FAIL
1000000 8 5.340 ms 0.51% 5.157 ms 0.50% -182.679 us -3.42% FAIL
10000000 8 5.919 ms 0.22% 5.681 ms 0.32% -238.045 us -4.02% FAIL
100000000 8 118.285 ms 0.45% 101.470 ms 0.13% -16814.458 us -14.22% FAIL
1000000000 8 1.360 s 0.05% 1.173 s 0.04% -186663.718 us -13.73% FAIL
1000000 2 5.120 ms 3.82% 5.010 ms 4.47% -109.889 us -2.15% PASS
10000000 2 5.663 ms 11.81% 5.492 ms 6.23% -171.784 us -3.03% PASS
100000000 2 115.939 ms 0.43% 101.711 ms 0.50% -14228.493 us -12.27% FAIL
1000000000 2 1.344 s 0.04% 1.177 s 0.04% -167581.476 us -12.46% FAIL
1000000 4 5.163 ms 0.53% 5.042 ms 0.56% -121.094 us -2.35% FAIL
10000000 4 5.716 ms 0.17% 5.531 ms 0.16% -185.413 us -3.24% FAIL
100000000 4 115.807 ms 0.30% 101.563 ms 0.34% -14244.424 us -12.30% FAIL
1000000000 4 1.343 s 0.03% 1.176 s 0.04% -167630.571 us -12.48% FAIL
1000000 6 5.181 ms 0.58% 5.088 ms 0.50% -92.827 us -1.79% FAIL
10000000 6 5.775 ms 0.50% 5.602 ms 0.15% -172.636 us -2.99% FAIL
100000000 6 116.078 ms 0.23% 101.390 ms 0.19% -14688.013 us -12.65% FAIL
1000000000 6 1.343 s 0.04% 1.172 s 0.05% -170184.149 us -12.68% FAIL
1000000 8 5.256 ms 0.53% 5.161 ms 0.50% -94.969 us -1.81% FAIL
10000000 8 5.842 ms 0.32% 5.679 ms 0.14% -162.748 us -2.79% FAIL
100000000 8 117.036 ms 0.30% 101.646 ms 0.50% -15390.610 us -13.15% FAIL
1000000000 8 1.345 s 0.05% 1.174 s 0.05% -170411.432 us -12.67% FAIL

@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven thirtiseven merged commit bb07951 into NVIDIA:branch-24.06 May 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants