Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add miniblock structural encoding #2859

Open
Tracked by #2856
westonpace opened this issue Sep 11, 2024 · 0 comments
Open
Tracked by #2856

Add miniblock structural encoding #2859

westonpace opened this issue Sep 11, 2024 · 0 comments

Comments

@westonpace
Copy link
Contributor

The mini block structural encoding is useful for narrow data types and is capable of handling opaque compressive encodings (such as delta compression). The basic idea is that we store data in 4KiB blocks. Each block will contain a buffer of repetition levels, definition levels, and values.

The advantage is that we only need to perform 1 IOP to retrieve the data, regardless of how many levels of repetition or definition we have.

Another advantage is that it enables opaque encodings since we always retrieve the full block.

It can also enable sparse encodings since the number of values per block is allowed to vary.

This forces read amplification up to 4KiB and so this may not be correct in all situations. However, this read amplification seems necessary when zipping in repetition and definition levels (the alternative is a separate buffer for each which might make sense in some purely RAM-based scenarios).

The "values-per-block" should be compressed (see #2857) and stored as metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant