You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
#10490 is adding in some beginnings of a test matrix for all JSON processing. It is a good start but it is not done. We are still missing a number of things.
Confs/Options
maxNestingDepth - added in 3.5.0 sets the maximum nesting depth of JSON before it is considered invalid. The default value is 1000, but in my tests with from_json It started to fail at a depth of 255. I don't know what the limit is for get_json_object. I also don't know if we want to go for the full 1000, or if we are okay with a smaller depth.
maxNumLen - also added in 3.5.0 this is the maximum length of a number. It looks like we go way beyond the default 1000 so we probably are okay, but it still would be good to have some kind of a test
maxStringLen - also added in 3.5.0. The default here is 20,000,000 (and I think that is chars not bytes) In my tests we go way beyond this but it would be good to add some tests here too.
local - local appears to impact date, timestamp, and decimal parsing. We have tests written and issues filed around decimal. We have not done the same for date/timestamp because there are issues with the date/timestamp format settings, which when fixed should hopefully mask this (as it really only impacts formats that we do not support) But tests that we fallback would still be good.
parseMode. We only support permissive, but we should have unified tests for this
corruptedColumnName this is for capturing the input that didn't parse properly. It only matters if a column with this name shows up in the read schema. We have some tests but we should have unified tests.
timeZone - we have some tests but we should look at putting them in the test matrix.
dateFormat - again we have some tests but they are not unified
timestampFormat - again we have some tests but they are not unified
timestampNTZFormat - again we have some tests but they are not unified
enableDateTimeParsingFallback. This also needs to be combined with LEGACY parsing configs testing
linSep/encoding. We have very little testing here, but for the most part it is ignored except for ScanJson so it is probably okay.
Input Data Types/Formats:
dictionary of double quoted strings
dictionary of single quoted strings
dictionary of ints
array of double quoted strings i.e. {"data":[...]}
array of single quoted strings
array of integer format
top level array of double quoted strings i.e. [...]
escaped chars in strings (need to test chars that can be encoded 3 ways \u \ and regular)
deeply nested/mixed data. These should include things that would map to the various map types
Data Types (note that these only apply to from_json and ScanJson because the others don't take a full read schema):
All nested types should be tested both at the top level and as a child (data) column. If we don't support those types yet, then we should verify that we fallback to the CPU as expected.
date
timestamp
map<string,string>
map<string,int>
map<string,decimal(38,0)>
array<string>
array<long>
array<decimal(38,0)>
map<string,map<string,string>>
map<string,array<string>>
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
#10490 is adding in some beginnings of a test matrix for all JSON processing. It is a good start but it is not done. We are still missing a number of things.
Confs/Options
Input Data Types/Formats:
{"data":[...]}
[...]
Data Types (note that these only apply to from_json and ScanJson because the others don't take a full read schema):
All nested types should be tested both at the top level and as a child (data) column. If we don't support those types yet, then we should verify that we fallback to the CPU as expected.
date
timestamp
map<string,string>
map<string,int>
map<string,decimal(38,0)>
array<string>
array<long>
array<decimal(38,0)>
map<string,map<string,string>>
map<string,array<string>>
The text was updated successfully, but these errors were encountered: