Skip to content

Commit

Permalink
Disable JsonTuple by default (#10420)
Browse files Browse the repository at this point in the history
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
  • Loading branch information
revans2 authored Feb 14, 2024
1 parent ed5e6b4 commit 28bb2b4
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/additional-functionality/advanced_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.IsNotNull"></a>spark.rapids.sql.expression.IsNotNull|`isnotnull`|Checks if a value is not null|true|None|
<a name="sql.expression.IsNull"></a>spark.rapids.sql.expression.IsNull|`isnull`|Checks if a value is null|true|None|
<a name="sql.expression.JsonToStructs"></a>spark.rapids.sql.expression.JsonToStructs|`from_json`|Returns a struct value with the given `jsonStr` and `schema`|false|This is disabled by default because it is currently in beta and undergoes continuous enhancements. Please consult the [compatibility documentation](../compatibility.md#json-supporting-types) to determine whether you can enable this configuration for your use case|
<a name="sql.expression.JsonTuple"></a>spark.rapids.sql.expression.JsonTuple|`json_tuple`|Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.|true|None|
<a name="sql.expression.JsonTuple"></a>spark.rapids.sql.expression.JsonTuple|`json_tuple`|Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.|false|This is disabled by default because JsonTuple on the GPU does not support all of the normalization that the CPU supports.|
<a name="sql.expression.KnownFloatingPointNormalized"></a>spark.rapids.sql.expression.KnownFloatingPointNormalized| |Tag to prevent redundant normalization|true|None|
<a name="sql.expression.KnownNotNull"></a>spark.rapids.sql.expression.KnownNotNull| |Tag an expression as known to not be null|true|None|
<a name="sql.expression.Lag"></a>spark.rapids.sql.expression.Lag|`lag`|Window function that returns N entries behind this one|true|None|
Expand Down
2 changes: 1 addition & 1 deletion docs/supported_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -8222,7 +8222,7 @@ are limited.
<td rowSpan="3">JsonTuple</td>
<td rowSpan="3">`json_tuple`</td>
<td rowSpan="3">Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.</td>
<td rowSpan="3">None</td>
<td rowSpan="3">This is disabled by default because JsonTuple on the GPU does not support all of the normalization that the CPU supports.</td>
<td rowSpan="3">project</td>
<td>json</td>
<td> </td>
Expand Down
14 changes: 9 additions & 5 deletions integration_tests/src/main/python/json_tuple_test.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -32,15 +32,17 @@ def test_json_tuple(json_str_pattern):
assert_gpu_and_cpu_are_equal_collect(
lambda spark: unary_op_df(spark, gen, length=10).selectExpr(
'json_tuple(a, "a", "email", "owner", "b", "b$", "b$$")'),
conf={'spark.sql.parser.escapedStringLiterals': 'true'})
conf={'spark.sql.parser.escapedStringLiterals': 'true',
'spark.rapids.sql.expression.JsonTuple': 'true'})

def test_json_tuple_select_non_generator_col():
gen = StringGen(pattern="{\"Zipcode\":\"abc\",\"ZipCodeType\":\"STANDARD\",\"City\":\"PARC PARQUE\",\"State\":\"PR\"}")
assert_gpu_and_cpu_are_equal_sql(
lambda spark : gen_df(spark, [('a', gen)]),
'table',
'select a, json_tuple(a, \"Zipcode\", \"ZipCodeType\", \"City\", \"State\") from table',
conf={'spark.sql.parser.escapedStringLiterals': 'true'})
conf={'spark.sql.parser.escapedStringLiterals': 'true',
'spark.rapids.sql.expression.JsonTuple': 'true'})

@allow_non_gpu('GenerateExec', 'JsonTuple')
@pytest.mark.parametrize('json_str_pattern', json_str_patterns, ids=idfn)
Expand All @@ -54,7 +56,8 @@ def test_json_tuple_with_large_number_of_fields_fallback(json_str_pattern):
"location", "city", "country", "zip", "code", "region", "state", "street", "block", "loc", \
"height", "h", "author", "title", "price", "isbn", "book", "rating", "score", "popular")'),
"JsonTuple",
conf={'spark.sql.parser.escapedStringLiterals': 'true'})
conf={'spark.sql.parser.escapedStringLiterals': 'true',
'spark.rapids.sql.expression.JsonTuple': 'true'})

@allow_non_gpu('GenerateExec', 'JsonTuple')
@pytest.mark.parametrize('json_str_pattern', json_str_patterns, ids=idfn)
Expand All @@ -66,4 +69,5 @@ def test_json_tuple_with_special_characters_fallback(json_str_pattern):
lambda spark: unary_op_df(spark, gen, length=10).selectExpr(
'json_tuple(a, "a", "a' + special_character + '")'),
"JsonTuple",
conf={'spark.sql.parser.escapedStringLiterals': 'true'})
conf={'spark.sql.parser.escapedStringLiterals': 'true',
'spark.rapids.sql.expression.JsonTuple': 'true'})
Original file line number Diff line number Diff line change
Expand Up @@ -3733,7 +3733,8 @@ object GpuOverrides extends Logging {
}
override def convertToGpu(): GpuExpression = GpuJsonTuple(childExprs.map(_.convertToGpu()))
}
),
).disabledByDefault("JsonTuple on the GPU does not support all of the normalization " +
"that the CPU supports."),
expr[org.apache.spark.sql.execution.ScalarSubquery](
"Subquery that will return only one row and one column",
ExprChecks.projectOnly(
Expand Down
6 changes: 3 additions & 3 deletions tools/generated_files/supportedExprs.csv
Original file line number Diff line number Diff line change
Expand Up @@ -279,9 +279,9 @@ IsNull,S,`isnull`,None,project,input,S,S,S,S,S,S,S,S,PS,S,S,S,S,NS,PS,PS,PS,NS
IsNull,S,`isnull`,None,project,result,S,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
JsonToStructs,NS,`from_json`,This is disabled by default because it is currently in beta and undergoes continuous enhancements. Please consult the [compatibility documentation](../compatibility.md#json-supporting-types) to determine whether you can enable this configuration for your use case,project,jsonStr,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA
JsonToStructs,NS,`from_json`,This is disabled by default because it is currently in beta and undergoes continuous enhancements. Please consult the [compatibility documentation](../compatibility.md#json-supporting-types) to determine whether you can enable this configuration for your use case,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NS,PS,PS,NA
JsonTuple,S,`json_tuple`,None,project,json,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA
JsonTuple,S,`json_tuple`,None,project,field,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA,NA,NA,NA
JsonTuple,S,`json_tuple`,None,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA
JsonTuple,NS,`json_tuple`,This is disabled by default because JsonTuple on the GPU does not support all of the normalization that the CPU supports.,project,json,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA,NA,NA,NA,NA,NA
JsonTuple,NS,`json_tuple`,This is disabled by default because JsonTuple on the GPU does not support all of the normalization that the CPU supports.,project,field,NA,NA,NA,NA,NA,NA,NA,NA,NA,PS,NA,NA,NA,NA,NA,NA,NA,NA
JsonTuple,NS,`json_tuple`,This is disabled by default because JsonTuple on the GPU does not support all of the normalization that the CPU supports.,project,result,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,S,NA,NA,NA
KnownFloatingPointNormalized,S, ,None,project,input,S,S,S,S,S,S,S,S,PS,S,S,S,S,S,PS,PS,PS,S
KnownFloatingPointNormalized,S, ,None,project,result,S,S,S,S,S,S,S,S,PS,S,S,S,S,S,PS,PS,PS,S
KnownNotNull,S, ,None,project,input,S,S,S,S,S,S,S,S,PS,S,S,NS,S,S,PS,PS,PS,NS
Expand Down

0 comments on commit 28bb2b4

Please sign in to comment.