-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields #48190
base: master
Are you sure you want to change the base?
Conversation
cc: @cloud-fan @dbatomic to take a look. |
cc @viirya |
|
||
test("SPARK-49743: prune unnecessary columns from GetArrayStructFields does not change schema") { | ||
val options = Map.empty[String, String] | ||
val schema1 = ArrayType(StructType.fromDDL("a int, b int"), containsNull = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just schema
? I don't see there are schema2
, schema3
...etc.
val field1 = StructField("A", IntegerType) // Instead of "a", use "A" to test case sensitivity. | ||
val query1 = testRelation2 | ||
.select(GetArrayStructFields( | ||
JsonToStructs(schema1, options, $"json"), field1, 0, 2, true).as("a")) | ||
val optimized1 = Optimizer.execute(query1.analyze) | ||
|
||
val prunedSchema1 = ArrayType(StructType.fromDDL("a int"), containsNull = true) | ||
val expected1 = testRelation2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
As you already have a e2e test in the PR description, maybe also add it to a unit test? |
Sure. Will add. |
What changes were proposed in this pull request?
GetArrayStructFields
, rely on the existingStructType
to obtain the pruned schema instead of using the accessed field.Why are the changes needed?
OptimizeCsvJsonExprs
rule that would have otherwise changed the schema fields of the underlying struct to be extracted.Does this PR introduce any user-facing change?
Yes. The query output would change for the queries of the following type:
Earlier, the result would had been:
vs the new result is (verified through spark-shell):
How was this patch tested?
a
toA
:Was this patch authored or co-authored using generative AI tooling?
No.