You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mini block structural encoding is useful for narrow data types and is capable of handling opaque compressive encodings (such as delta compression). The basic idea is that we store data in 4KiB blocks. Each block will contain a buffer of repetition levels, definition levels, and values.
The advantage is that we only need to perform 1 IOP to retrieve the data, regardless of how many levels of repetition or definition we have.
Another advantage is that it enables opaque encodings since we always retrieve the full block.
It can also enable sparse encodings since the number of values per block is allowed to vary.
This forces read amplification up to 4KiB and so this may not be correct in all situations. However, this read amplification seems necessary when zipping in repetition and definition levels (the alternative is a separate buffer for each which might make sense in some purely RAM-based scenarios).
The "values-per-block" should be compressed (see #2857) and stored as metadata.
The text was updated successfully, but these errors were encountered:
The mini block structural encoding is useful for narrow data types and is capable of handling opaque compressive encodings (such as delta compression). The basic idea is that we store data in 4KiB blocks. Each block will contain a buffer of repetition levels, definition levels, and values.
The advantage is that we only need to perform 1 IOP to retrieve the data, regardless of how many levels of repetition or definition we have.
Another advantage is that it enables opaque encodings since we always retrieve the full block.
It can also enable sparse encodings since the number of values per block is allowed to vary.
This forces read amplification up to 4KiB and so this may not be correct in all situations. However, this read amplification seems necessary when zipping in repetition and definition levels (the alternative is a separate buffer for each which might make sense in some purely RAM-based scenarios).
The "values-per-block" should be compressed (see #2857) and stored as metadata.
The text was updated successfully, but these errors were encountered: