Skip to content

blmarket/csv2parquet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Csv2parquet

Converting csv 2 parquet as fast as I can, using rust.

Why?

Being forced to waste a lot of time just waiting my slow machine reading bunch of CSV files, I was just curious how much I can improve on this end.

Why Rust?

Modern C++ is getting harder to catch-up. My C/C++ code style is more like just C + STL libraries, not leveraging recent C++ standards. It was easier to learn Rust from the scratch, instead of improving my poor C style I'm familiar with more than 10 years.

Implementation details

As current rust parquet implementation does not have buffered column writer like cpp version, I had to implement my own version of buffered column writer, and some copy-and-paste version of row group writer etc.

Future improvements

Currently buffer is just using vec[u8] where a lot of copy will happen if we write more and more data. Would be nice if I can find some linked list buffer supporting append operation.

About

Convert CSV to parquet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published