You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The cudf JSON reader has a large memory footprint - it was around 8x in 23.12 and in testing for 24.08 has exceeded 15x. This makes JSON reading very difficult to do in memory-constrained environments. Let's add a low-memory mode for JSON lines reader based on byte-range support.
Here is an experiment that yields the same dataframe, but reading in 100 MB chunks.
And surprisingly the chunked reader is even faster!!
Additional context
If we make the chunk size less than 2GB, we will have large strings support in the JSON reader. I believe we should consider making byte-range based reading the default for cudf.pandas.
The text was updated successfully, but these errors were encountered:
The cudf JSON reader has a large memory footprint - it was around 8x in 23.12 and in testing for 24.08 has exceeded 15x. This makes JSON reading very difficult to do in memory-constrained environments. Let's add a low-memory mode for JSON lines reader based on byte-range support.
Here is an experiment that yields the same dataframe, but reading in 100 MB chunks.
And surprisingly the chunked reader is even faster!!
Additional context
If we make the chunk size less than 2GB, we will have large strings support in the JSON reader. I believe we should consider making byte-range based reading the default for cudf.pandas.
The text was updated successfully, but these errors were encountered: