-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Out of core CSV support using Apache Arrow CSV reader (fast 🔥!) #1028
Conversation
873befb
to
78eddea
Compare
b9edab1
to
9e8b92e
Compare
9e8b92e
to
3116d66
Compare
47b7e48
to
ed0eba4
Compare
369423b
to
6927b35
Compare
ed0eba4
to
45b9d84
Compare
45b9d84
to
ae2d60d
Compare
Hi @maartenbreddels |
In this PR we will also try to support reading of gziped CSV. Here are some relevant threads or comments: |
d89c6a2
to
c9cb709
Compare
8be5c71
to
c267f64
Compare
Only lazy csv reading is not supported.
a012256
to
1484ca9
Compare
This one hangs quite regularly |
@JovanVeljanoski we need to discuss this, how we expose this.
Questions
I want to move some input/output function from
__init__.py
intoio.py
, let me know if you like it.Stats on a 70GB CSV file (on nyx, 64 cores AMD ryzen):
$ time py.test tests/csv_test.py -v -k test_large_csv_count_array_lengths
$ time py.test tests/csv_test.py -v -k test_large_csv_count_array_lengths
TODO