Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for zstd-compression #1786

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

danielmitterdorfer
Copy link
Member

With this commit we add support for zstd compressed corpora. Compared to bzip, the zstd format produces compressed files that are roughly 40% smaller and took around a third of the time to decompress in our tests.

Closes #1781

With this commit we add support for zstd compressed corpora. Compared to
bzip, the zstd format produces compressed files that are roughly 40%
smaller and took around a third of the time to decompress in our tests.

Closes elastic#1781
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Sep 27, 2023
@danielmitterdorfer danielmitterdorfer added this to the 2.10.0 milestone Sep 27, 2023
@danielmitterdorfer danielmitterdorfer self-assigned this Sep 27, 2023
@danielmitterdorfer danielmitterdorfer changed the title Add support for zst-compression Add support for zstd-compression Sep 27, 2023
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it but I assume you have and the code looks good to me.

I also checked that you're using the fastest python-zstandard API and the only one that supports reading across ZSTD frames. (This was an issue in urllib3 which had to use the standard library abstraction which is slower and where we need to pass read_across_frames=True explicitly!)

@danielmitterdorfer
Copy link
Member Author

Thanks! Yes, I've tested both cases (with the library and the native binary). Also, the library I've picked seems to be the most mature of them.

@danielmitterdorfer danielmitterdorfer merged commit a230317 into elastic:master Sep 27, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for zstd-compressed corpora
2 participants