You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The libcudf chunked Parquet reader can generate a LIST column that has invalid offsets where an offset goes back to zero after row 0 (e.g.: 0, 7, 23, 43, 53, 56, 0, 71). Loading without the chunked reader does not produce invalid offsets.
Steps/Code to reproduce bug
Load the following file with the chunked Parquet reader and note that the offsets column for the new_10 column (column index 11 in the resulting table, a LIST of TIMESTAMP_DAYS) are incorrect. 1418348638.parquet.gz
Expected behavior
LIST offset column values should never be less than the previous value in the offset column.
The text was updated successfully, but these errors were encountered:
…nked reader. (#15342)
Fixes#15306
The core issue here was that under certain conditions, the chunked reader could generate invalid page indices for list columns when using the chunked reader. This led to corruption in the decode kernels. The fix is fairly simple, but there's a decent amount of delta in this PR that is just name changes for clarity and some more comments/docs.
This affected the number of chunks generated in some of the very (unrealistically) constrained tests.
Authors:
- https://github.com/nvdbaranec
- Nghia Truong (https://github.com/ttnghia)
Approvers:
- Nghia Truong (https://github.com/ttnghia)
- Vukasin Milovanovic (https://github.com/vuule)
URL: #15342
Describe the bug
The libcudf chunked Parquet reader can generate a LIST column that has invalid offsets where an offset goes back to zero after row 0 (e.g.: 0, 7, 23, 43, 53, 56, 0, 71). Loading without the chunked reader does not produce invalid offsets.
Steps/Code to reproduce bug
Load the following file with the chunked Parquet reader and note that the offsets column for the
new_10
column (column index 11 in the resulting table, a LIST of TIMESTAMP_DAYS) are incorrect.1418348638.parquet.gz
Expected behavior
LIST offset column values should never be less than the previous value in the offset column.
The text was updated successfully, but these errors were encountered: