You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the attached parquet file, if we read it then write it out into an .orc file using cudf ORC writer then the output file cannot be read in Pandas or Spark.
For more details, the given file contains a map with content like this:
Closes#15775
ORC writer encodes null mask bits in multiples of eight to avoid issues with other readers reading partial encoded bytes. When this does not align with row groups, the null mask encode boundaries are moved to align to multiples of eight. There was a bug in the alignment code that caused a pointless shift by 8 bits and, then, issues in encode. This PR fixes the unnecessary shift.
Authors:
- Vukasin Milovanovic (https://github.com/vuule)
Approvers:
- Nghia Truong (https://github.com/ttnghia)
- Muhammad Haseeb (https://github.com/mhaseeb123)
- Vyas Ramasubramani (https://github.com/vyasr)
URL: #15789
For the attached parquet file, if we read it then write it out into an
.orc
file using cudf ORC writer then the output file cannot be read in Pandas or Spark.For more details, the given file contains a map with content like this:
And code to read/write file in cudf:
Related: NVIDIA/spark-rapids#10806.
The text was updated successfully, but these errors were encountered: