Get json object comments address #1924

thirtiseven · 2024-04-03T10:23:15Z

Closes #1906
Fixed #2021

This PR addresses some follow up comments for getJsonObject new kernel. After addressing it shows about 12% speedup in microbenchmark, but not seen speedup in e2e tests, almost all performance improvement comes from Move 2 variablesstring_token_utf8_bytes, bytes_diff_for_e…

Also fixed a bug:

Spark behaviors:

    // use unescaping logic
    get_json_object("['\\u4e2d\\u56FD']", "$")    : ["中国"]

    // use escaping logic
    get_json_object("'\\u4e2d\\u56FD'", "$")      : 中国

For the above 2 cases, both need to update the utf8_bytes for both ESCAPE and UNESCAPE mode.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2024-04-10T02:36:35Z

Will apply performance improvement changes from #1930 after it merged in 24.04

YanxuanLiu · 2024-04-12T06:40:33Z

build

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

…scape_writing. Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

res-life · 2024-04-26T01:08:21Z

This PR also fixes #1963?
Not sure if need to split this PR: one is only for refactor/comments; one is for fixing #1963

src/main/cpp/src/json_parser.cuh

res-life · 2024-04-26T01:32:16Z

src/main/cpp/src/json_parser.cuh

-            curr_token = json_token::ERROR;
-          }
+          // previous token is not INIT, means already get a token; stack is
+          // empty; Successfully parsed. Note: ignore the tailing sub-string


Add a comment like:

/** * Allow tail useless sub-string in JSON, e.g.: * The following invalid JSON is allowed: {'k' : 'v'}_extra_tail_sub_string */

res-life · 2024-04-26T01:35:26Z

src/main/cpp/src/json_parser.cuh

@@ -567,6 +508,116 @@ class json_parser {
    }
  }

+  __device__ inline std::pair<int, int> try_parse_quoted_string_size(char const* str_pos,


Add comments for this function.

Rename to: writeString?

thirtiseven · 2024-04-26T01:37:42Z

This PR also fixes #1963? Not sure if need to split this PR: one is only for refactor/comments; one is for fixing #1963

Ok will split it out

res-life · 2024-04-26T01:38:25Z

src/main/cpp/src/json_parser.cuh

+    // Records string/field name token utf8 bytes size after unescaped
+    // e.g.: For JSON 4 chars string "\\n", after unescaped, get 1 char '\n'
+    // used by checking the max string length
+    int string_token_utf8_bytes = 0;


Rename to: unescped_string_utf8_bytes?

res-life · 2024-04-26T01:39:38Z

src/main/cpp/src/json_parser.cuh

+    // e.g.: 4 chars string "\\n", string_token_utf8_bytes is 1,
+    // when `write_escaped_text`, will write out 4 chars: " \ n ",
+    // then this diff will be 4 - 1 = 3
+    int bytes_diff_for_escape_writing = 0;


Rename to: escped_string_utf8_bytes?
Refactor to record actual bytes instead of diff? will make code more readable.

res-life · 2024-04-26T01:51:48Z

src/main/cpp/src/get_json_object.cu

@@ -660,8 +377,7 @@ __device__ inline thrust::tuple<bool, int> path_match_subscript_index_subscript_
 __device__ bool evaluate_path(json_parser& p,
                              json_generator& root_g,
                              write_style root_style,
-                              path_instruction const* root_path_ptr,
-                              int root_path_size)
+                              cudf::device_span<path_instruction const> root_path)


Note sure if using device_span will cause perf regression, because the size in device_span is 64 bits integer which is using more memory compared to 32 bits size.

no perf regression seen from microbenchmark

res-life · 2024-04-26T02:03:22Z

src/main/cpp/src/get_json_object.cu

+{
+  auto tid          = cudf::detail::grid_1d::global_thread_id();
+  auto const stride = cudf::detail::grid_1d::grid_stride();
+


No need to sync.
__ballot_sync is used to sync to generate a 32 bits mask within 32 threads(in a warp).
This get_json_object_size_kernel does not touch any mask, so need to do sync.

res-life · 2024-04-26T02:03:54Z

src/main/cpp/src/get_json_object.cu

+      d_sizes[tid] = 0;
+    }
+
+    tid += stride;


no need, refer to previous comment.

res-life · 2024-04-26T02:06:48Z

src/main/cpp/src/get_json_object.cu

@@ -1179,7 +921,7 @@ __device__ thrust::pair<bool, size_t> get_json_object_single(
  json_generator generator((out_buf == nullptr || out_buf_size == 0) ? nullptr : out_buf);

  bool const success = evaluate_path(
-    j_parser, generator, write_style::raw_style, path_commands_ptr, path_commands_size);
+    j_parser, generator, write_style::RAW, {path_commands.data(), path_commands.size()});

  if (nullptr == out_buf && !success) {


Remove nullptr == out_buf?
On first phase the out_buf is null, on second phase it's not null.

res-life · 2024-04-26T03:57:03Z

src/main/cpp/src/json_parser.cuh

@@ -624,14 +675,20 @@ class json_parser {
    char const* to_match_str_pos,
    char const* const to_match_str_end,
    char* copy_destination,
-    write_style w_style)
+    escape_style w_style)
  {


Extract a match_unescaped_string from this function? Since try_parse_quoted_string is doing 2 things.
match_unescaped_string matches a valid string and is used by match_current_field_name

Co-authored-by: Chong Gao <gaochong.gc@qq.com>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2024-04-26T11:49:59Z

micro benchmark for current code:

before:

size_bytes	max_depth	Samples	CPU Time	Noise	GPU Time	Noise
1000000	2	2871x	5.206 ms	4.40%	5.203 ms	4.40%
10000000	2	87x	5.766 ms	0.43%	5.762 ms	0.42%
100000000	2	11x	116.033 ms	0.13%	116.030 ms	0.13%
1000000000	2	11x	1.358 s	0.05%	1.358 s	0.05%
1000000	4	1872x	5.279 ms	0.72%	5.275 ms	0.72%
10000000	4	86x	5.877 ms	0.44%	5.873 ms	0.43%
100000000	4	11x	117.015 ms	0.35%	117.012 ms	0.35%
1000000000	4	11x	1.359 s	0.07%	1.359 s	0.07%
1000000	6	1808x	5.294 ms	0.75%	5.290 ms	0.74%
10000000	6	100x	5.941 ms	0.50%	5.937 ms	0.50%
100000000	6	11x	117.163 ms	0.17%	117.160 ms	0.17%
1000000000	6	11x	1.361 s	0.05%	1.361 s	0.05%
1000000	8	1840x	5.382 ms	0.74%	5.379 ms	0.73%
10000000	8	83x	6.028 ms	0.41%	6.024 ms	0.41%
100000000	8	11x	117.987 ms	0.36%	117.983 ms	0.36%
1000000000	8	11x	1.363 s	0.07%	1.363 s	0.07%

after:

size_bytes	max_depth	Samples	CPU Time	Noise	GPU Time	Noise
1000000	2	2942x	5.080 ms	3.52%	5.076 ms	3.52%
10000000	2	91x	5.556 ms	0.40%	5.552 ms	0.39%
100000000	2	30x	102.939 ms	0.50%	102.936 ms	0.50%
1000000000	2	11x	1.197 s	0.04%	1.197 s	0.04%
1000000	4	1936x	5.154 ms	0.67%	5.151 ms	0.67%
10000000	4	88x	5.688 ms	0.36%	5.684 ms	0.36%
100000000	4	11x	103.090 ms	0.37%	103.086 ms	0.37%
1000000000	4	11x	1.199 s	0.04%	1.199 s	0.04%
1000000	6	1680x	5.171 ms	0.70%	5.168 ms	0.70%
10000000	6	87x	5.760 ms	0.49%	5.757 ms	0.48%
100000000	6	146x	103.382 ms	1.33%	103.379 ms	1.33%
1000000000	6	11x	1.199 s	0.04%	1.199 s	0.04%
1000000	8	1648x	5.259 ms	0.75%	5.256 ms	0.75%
10000000	8	86x	5.856 ms	0.48%	5.852 ms	0.47%
100000000	8	11x	104.066 ms	0.32%	104.063 ms	0.32%
1000000000	8	11x	1.200 s	0.04%	1.200 s	0.04%

It's about 12% speedup, will run some e2e test and do a per commit break down next.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

res-life · 2024-04-30T06:35:08Z

BTW, please remove #include <cudf/json/json.hpp> in get_json_object.cu

res-life · 2024-04-30T06:39:32Z

src/main/cpp/src/get_json_object.cu

 {
-  auto match = path_match_element(path_ptr, path_size, path_instruction_type::INDEX);
+  auto match =
+    path_match_elements(path, path_instruction_type::SUBSCRIPT, path_instruction_type::INDEX);


path_instruction_type::SUBSCRIPT is removed after this PR: #1987

res-life · 2024-04-30T06:46:10Z

src/main/cpp/src/get_json_object.cu

@@ -1042,4 +1097,4 @@ std::unique_ptr<cudf::column> get_json_object(
  return detail::get_json_object(input, instructions, stream, mr);
 }

-}  // namespace spark_rapids_jni
+}  // namespace spark_rapids_jni


Add an empty line in the file end.

res-life · 2024-04-30T06:56:54Z

src/main/cpp/src/json_parser.cuh

+          if (copy_destination != nullptr) copy_destination += escape_chars;
+          escped_string_utf8_bytes += (escape_chars - 1);
+        }
+


// check match if enabled const char* match_str_pos = nullptr; if (!try_match_char(match_str_pos, nullptr, *str_pos)) { return std::make_pair(unescped_string_utf8_bytes, escped_string_utf8_bytes); }

Please check this block. try_match_char(nullptr, nullptr, *str_pos) ?

This is to solve cannot bind non-const lvalue reference of type 'const char*&' to a value of type 'std::nullptr_t'

res-life · 2024-04-30T07:41:31Z

src/main/cpp/src/json_parser.cuh

@@ -998,7 +985,8 @@ class json_parser {
  __device__ bool try_skip_unicode(char const*& str_pos,
                                   char const*& to_match_str_pos,
                                   char const* const to_match_str_end,
-                                   char*& copy_dest)
+                                   char*& copy_dest,
+                                   int& unescped_string_utf8_bytes)


Please pass in escped_string_utf8_bytes and update.
I found an existing bug which is not introduced in this PR.
Spark behaviors:

// use unescaping logic get_json_object("['\\u4e2d\\u56FD']", "$") : ["中国"] // use escaping logic get_json_object("'\\u4e2d\\u56FD'", "$") : 中国

For the above 2 cases, both need to update the utf8_bytes for both ESCAPE and UNESCAPE mode.
Please update test case getJsonObjectTest_Escape

String JSON6 = "['\\u4e2d\\u56FD\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\b']"; String expectedStr6 = "中国\\\"'\\\\/\\b\\f\\n\\r\\t\\b";

And please update the PR description to describe this bug.

Nice catch! done.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2024-04-30T10:15:33Z

build

revans2 · 2024-04-30T14:40:45Z

build

revans2 · 2024-04-30T15:56:09Z

@thirtiseven can you tell me how you built and ran the benchmarks? I get compile errors when I try to turn it on using the docker container.

./build/build-in-docker -DGPU_ARCHS="86" -DCPP_PARALLEL_LEVEL=32 -Dlibcudf.build.configure=false -Dlibcudf.clean.skip=true -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=ON clean install -Dsubmodule.check.skip=true -Dtest='GetJsonObjectTest,!CuFileTest,!CudaFatalTest,!ColumnViewNonEmptyNullsTest'
...
[INFO]      [exec] CMake Error at benchmarks/CMakeLists.txt:58 (target_link_libraries):
[INFO]      [exec]   Target "ROW_CONVERSION_BENCH" links to:
[INFO]      [exec] 
[INFO]      [exec]     cudf::cudftestutil
[INFO]      [exec] 
[INFO]      [exec]   but the target was not found.  Possible reasons include:
[INFO]      [exec] 
[INFO]      [exec]     * There is a typo in the target name.
[INFO]      [exec]     * A find_package call is missing for an IMPORTED target.
[INFO]      [exec]     * An ALIAS target is missing.
[INFO]      [exec]

thirtiseven · 2024-04-30T16:44:41Z

@revans2 I think set -DBUILD_TESTS=ON in your command will make it work but I don't know why.

My commands:
build

./build/build-in-docker install -Dsubmodule.check.skip=true -DCPP_PARALLEL_LEVEL=15 -DBUILD_TESTS -DUSE_GDS=\!ON -DBUILD_BENCHMARKS=ON -DskipTests

run benchmark

target/cmake-build/benchmarks/GET_JSON_OBJECT_BENCH --json latest.json

ttnghia · 2024-04-30T17:23:20Z

Because cudftestutil is used in benchmark code but it is compiled in cudf test so you have to enable the test module.

ttnghia · 2024-04-30T17:25:54Z

@thirtiseven When posting benchmark number, please use the output of the nvbench compare script so we can see the diff easier:

python nvbench_compare.py before.json after.json

revans2 · 2024-05-01T13:20:37Z

@ttnghia have you had a chance to review this?

ttnghia · 2024-05-01T13:31:49Z

@ttnghia have you had a chance to review this?

I'll try to review it soon today.

ttnghia · 2024-05-03T17:14:21Z

@ttnghia have you had a chance to review this?

I'll try to review it soon today.

Sorry I was sick in the last several days. Now reviewing...

ttnghia · 2024-05-03T17:17:05Z

src/main/cpp/src/get_json_object.cu

+  char const* input,
+  cudf::size_type input_len,


How about device_span<char const>? json_parser constructor can also be modified to take in device_span.

Oh wait, why do we have this overload of get_json_object_size_single? Just one version (line 874) is not enough? I don't see any reason why we need to have this new overload.

Yes it won't gain performance, reverted.

ttnghia · 2024-05-03T17:31:32Z

src/main/cpp/src/get_json_object.cu

+template <int block_size>
+__launch_bounds__(block_size) CUDF_KERNEL
+  void get_json_object_size_kernel(cudf::column_device_view col,


Similar to the other overload, it is not necessary to have this kernel-I don't think we will gain performance by doing this. We can just use the existing kernel like before, but rewrite it a bit to minimize overhead when computing the output size. A few if instruction overhead in the kernel should not cause any performance difference since the kernel is limited by memory bandwidth, not compute capability

To be clear the difference is about spilling and the number of registers. This is what I found when working on #2015

By splitting them it reduced the number of registers for the one that does not write out data. You are correct that it has essentially no impact to the performance.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2024-05-06T02:24:59Z

Compare results:

size_bytes	max_depth	Ref Time	Ref Noise	Cmp Time	Cmp Noise	Diff	%Diff	Status
1000000	2	5.180 ms	4.07%	5.043 ms	3.79%	-137.399 us	-2.65%	PASS
10000000	2	5.713 ms	5.41%	5.491 ms	0.28%	-221.066 us	-3.87%	FAIL
100000000	2	117.399 ms	0.12%	101.573 ms	0.28%	-15825.535 us	-13.48%	FAIL
1000000000	2	1.361 s	0.02%	1.179 s	0.03%	-182007.568 us	-13.37%	FAIL
1000000	4	5.239 ms	0.52%	5.070 ms	0.50%	-168.354 us	-3.21%	FAIL
10000000	4	5.799 ms	0.50%	5.560 ms	0.14%	-239.384 us	-4.13%	FAIL
100000000	4	117.154 ms	0.23%	101.380 ms	0.17%	-15773.619 us	-13.46%	FAIL
1000000000	4	1.359 s	0.04%	1.177 s	0.05%	-182614.813 us	-13.44%	FAIL
1000000	6	5.265 ms	0.65%	5.125 ms	0.50%	-139.882 us	-2.66%	FAIL
10000000	6	5.855 ms	0.42%	5.649 ms	0.50%	-206.381 us	-3.52%	FAIL
100000000	6	117.486 ms	0.11%	101.598 ms	0.45%	-15888.577 us	-13.52%	FAIL
1000000000	6	1.359 s	0.04%	1.176 s	0.03%	-183312.345 us	-13.49%	FAIL
1000000	8	5.340 ms	0.51%	5.157 ms	0.50%	-182.679 us	-3.42%	FAIL
10000000	8	5.919 ms	0.22%	5.681 ms	0.32%	-238.045 us	-4.02%	FAIL
100000000	8	118.285 ms	0.45%	101.470 ms	0.13%	-16814.458 us	-14.22%	FAIL
1000000000	8	1.360 s	0.05%	1.173 s	0.04%	-186663.718 us	-13.73%	FAIL
1000000	2	5.120 ms	3.82%	5.010 ms	4.47%	-109.889 us	-2.15%	PASS
10000000	2	5.663 ms	11.81%	5.492 ms	6.23%	-171.784 us	-3.03%	PASS
100000000	2	115.939 ms	0.43%	101.711 ms	0.50%	-14228.493 us	-12.27%	FAIL
1000000000	2	1.344 s	0.04%	1.177 s	0.04%	-167581.476 us	-12.46%	FAIL
1000000	4	5.163 ms	0.53%	5.042 ms	0.56%	-121.094 us	-2.35%	FAIL
10000000	4	5.716 ms	0.17%	5.531 ms	0.16%	-185.413 us	-3.24%	FAIL
100000000	4	115.807 ms	0.30%	101.563 ms	0.34%	-14244.424 us	-12.30%	FAIL
1000000000	4	1.343 s	0.03%	1.176 s	0.04%	-167630.571 us	-12.48%	FAIL
1000000	6	5.181 ms	0.58%	5.088 ms	0.50%	-92.827 us	-1.79%	FAIL
10000000	6	5.775 ms	0.50%	5.602 ms	0.15%	-172.636 us	-2.99%	FAIL
100000000	6	116.078 ms	0.23%	101.390 ms	0.19%	-14688.013 us	-12.65%	FAIL
1000000000	6	1.343 s	0.04%	1.172 s	0.05%	-170184.149 us	-12.68%	FAIL
1000000	8	5.256 ms	0.53%	5.161 ms	0.50%	-94.969 us	-1.81%	FAIL
10000000	8	5.842 ms	0.32%	5.679 ms	0.14%	-162.748 us	-2.79%	FAIL
100000000	8	117.036 ms	0.30%	101.646 ms	0.50%	-15390.610 us	-13.15%	FAIL
1000000000	8	1.345 s	0.05%	1.174 s	0.05%	-170411.432 us	-12.67%	FAIL

thirtiseven · 2024-05-06T02:27:45Z

build

thirtiseven added 5 commits April 2, 2024 18:55

wip, check point before removing template

615b38c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

wip, check point before device_span

ac8089d

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

wip, check point before device_span in evaluate_path

9d77ecd

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

wip, check point before separate size and writing value

c6f52e6

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

wip

88ace10

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'branch-24.06' into get-json-object-comments-address

2364902

thirtiseven self-assigned this Apr 24, 2024

thirtiseven added 5 commits April 24, 2024 13:55

checkpoint after merge latest code

ea1521e

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

checkpoint

314490d

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

checkpoint after apply comment address before merge

ef24d02

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

checkpoint, some size/output seperation

f814308

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

checkpoint: Move 2 variablesstring_token_utf8_bytes, bytes_diff_for_e…

0a0fe3b

…scape_writing. Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

res-life reviewed Apr 26, 2024

View reviewed changes

src/main/cpp/src/json_parser.cuh Outdated Show resolved Hide resolved

res-life reviewed Apr 26, 2024

View reviewed changes

thirtiseven and others added 2 commits April 26, 2024 17:27

Update src/main/cpp/src/json_parser.cuh

938b7d4

Co-authored-by: Chong Gao <gaochong.gc@qq.com>

checkpoint for some benchmark

8873471

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

delete json, cudf

18e53af

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven marked this pull request as ready for review April 30, 2024 03:01

res-life reviewed Apr 30, 2024

View reviewed changes

fix try_skip_unicode bug

1f094be

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven requested review from revans2 and ttnghia April 30, 2024 10:15

revans2 previously approved these changes Apr 30, 2024

View reviewed changes

revans2 mentioned this pull request May 3, 2024

Adjust the launch bounds to get_json_object to avoid spilling #2015

Merged

ttnghia reviewed May 3, 2024

View reviewed changes

address comments

28387a9

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven dismissed revans2’s stale review via 28387a9 May 6, 2024 02:25

thirtiseven requested a review from ttnghia May 6, 2024 02:27

ttnghia approved these changes May 6, 2024

View reviewed changes

thirtiseven merged commit bb07951 into NVIDIA:branch-24.06 May 6, 2024
3 checks passed

sameerz added the feature request label May 13, 2024

Get json object comments address #1924

Get json object comments address #1924

Conversation

thirtiseven commented Apr 3, 2024 • edited Loading

thirtiseven commented Apr 10, 2024

YanxuanLiu commented Apr 12, 2024

res-life commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thirtiseven commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thirtiseven commented Apr 26, 2024

res-life commented Apr 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thirtiseven commented Apr 30, 2024

revans2 commented Apr 30, 2024

revans2 commented Apr 30, 2024

thirtiseven commented Apr 30, 2024 • edited Loading

ttnghia commented Apr 30, 2024 • edited Loading

ttnghia commented Apr 30, 2024 • edited Loading

revans2 commented May 1, 2024

ttnghia commented May 1, 2024

ttnghia commented May 3, 2024

ttnghia May 3, 2024 • edited Loading

Choose a reason for hiding this comment

ttnghia May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thirtiseven commented May 6, 2024

thirtiseven commented May 6, 2024

thirtiseven commented Apr 3, 2024 •

edited

Loading

thirtiseven commented Apr 30, 2024 •

edited

Loading

ttnghia commented Apr 30, 2024 •

edited

Loading

ttnghia commented Apr 30, 2024 •

edited

Loading

ttnghia May 3, 2024 •

edited

Loading

ttnghia May 3, 2024 •

edited

Loading