Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start working on a more complete json test matrix json #10490

Merged
merged 5 commits into from
Feb 28, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Feb 23, 2024

This starts to unify the tests for from_json/JsonToStructs, ScanJson, get_json_object, and json_tuples so it is more clear what features are supported for which commands and it helps to ensure that we test corner cases the same way for all of the operators.

I ported some from_json tests over from spark to help verify things, but many of these will also need to be pulled into the test matrix to hopefully detect more corner cases.

This is not complete and there will be follow on issues to try and flush it out more, and there will be follow in issues to eventually remove the duplicate tests once we have a more complete test matrix for JSON. I stopped where I did because there is enough work to really keep us busy already with just basic types.

This fixes #10455
This fixes #10452

Just because they were bugging me enough that I wanted to fix them instead of doing a separate patch.

…lowSingleQuote checks

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Feb 23, 2024

build

@revans2 revans2 mentioned this pull request Feb 23, 2024
41 tasks
jlowe
jlowe previously approved these changes Feb 23, 2024
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit that's not must-fix.

Comment on lines 513 to 521
TEST_FILES = ["int_formatted.json",
"float_formatted.json",
"sci_formatted.json",
"int_formatted_strings.json",
"float_formatted_strings.json",
"sci_formatted_strings.json",
"decimal_locale_formatted_strings.json",
"single_quoted_strings.json",
"boolean_formatted.json"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Nit: On a related node, could reduce some boilerplate in these tests that want to use these test files by creating a method that takes a tuple of (filename, reason) and will make a copy of this base version with the corresponding filename marked xfail with the corresponding reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was hoping that one of the tests would use it, but all of them eventually had some failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan is to leave this around despite being unused? A utility method to parameterize this with xfail overrides like I mentioned above would reduce the copying and reference this list, but I'm also fine with it as-is.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 23, 2024

Some of the tests failed for 3.1.1. Investigating if Spark is different or if we are.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 23, 2024

Looking deeper it looks like https://issues.apache.org/jira/browse/SPARK-38060 is the culprit. Thanks @andygrove for fixing this is Spark. For now I xfail the tests if the version is before 3.3.0, and I'll file an issue about it.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 23, 2024

build

@sameerz sameerz added test Only impacts tests tech debt labels Feb 24, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Feb 26, 2024

@jlowe the tests pass now if you could take another look that would be great.

jlowe
jlowe previously approved these changes Feb 26, 2024
Comment on lines 513 to 521
TEST_FILES = ["int_formatted.json",
"float_formatted.json",
"sci_formatted.json",
"int_formatted_strings.json",
"float_formatted_strings.json",
"sci_formatted_strings.json",
"decimal_locale_formatted_strings.json",
"single_quoted_strings.json",
"boolean_formatted.json"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan is to leave this around despite being unused? A utility method to parameterize this with xfail overrides like I mentioned above would reduce the copying and reference this list, but I'm also fine with it as-is.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 26, 2024

I forgot about the TEST_FILES. I am happy to remove it. I still hope that we can get there, but I'll remove it for now.

@revans2
Copy link
Collaborator Author

revans2 commented Feb 26, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Feb 27, 2024

build

@revans2 revans2 merged commit ac993f7 into NVIDIA:branch-24.04 Feb 28, 2024
38 of 40 checks passed
@revans2 revans2 deleted the json_testing branch February 28, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech debt test Only impacts tests
Projects
None yet
3 participants