Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make remote snapshot local file_cache block size configurable #14990

Open
finnegancarroll opened this issue Jul 27, 2024 · 4 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Searchable Snapshots

Comments

@finnegancarroll
Copy link
Contributor

Is your feature request related to a problem? Please describe

To perform a search on a remote snapshot we download only the specific blocks of the snapshot needed to complete the search. These blocks have a set 8MB size and are stored on disk in a local reference counted file cache. While there are benefits to pulling down large blocks to take advantage of spatial locality and reduce the overhead of accessing our remote store, we also risk over populating our cache with un-needed data.

The large block size is particularly noticeable when initializing a remote snapshot. For each segment Lucene opens and holds onto file references to metadata. Lucene never closes these file references so the blocks must remain downloaded and present in our cache for the lifetime of the program. Particularly in the case of 'metadata' blocks 8MB is a lot and so the baseline disk usage of our caches can be drastically reduced with a more conservative block size.

Describe the solution you'd like

Can this block size be a configurable setting for a remote snapshot repository?

Related component

Search:Searchable Snapshots

Describe alternatives you've considered

Alternatively could a smaller default still improve performance? How was 8MB selected?

Additional context

Some short tests with 13GB of OSB Big5 data restored from a remote snapshot local to that cluster. This does mean very little overhead for accessing the remote snapshot and a more robust test should use an actual remote store to get a better idea of how the overhead of more frequent block downloads impacts performance.

file_cache capacity is 10MB so that we can easily populate our cache fully.

OSB query-string-on-message workload chosen due to the large number of block downloads required. Something less expensive might never access any doc fields.

Block size: 2^23 bytes
File cache baseline: 275 MB
Snapshot restore time: 843 ms
    // |                                                 Min Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |    1.99 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 315.597 |     ms |
    // |                                        90th percentile latency | query-string-on-message |  320.52 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 432.611 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 508.895 |     ms |
    // |                                   50th percentile service time | query-string-on-message | 314.269 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 318.254 |     ms |
    // |                                   99th percentile service time | query-string-on-message |  431.92 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 507.844 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |
Block size: 2^21 bytes
File cache baseline: 109 MB
Snapshot restore time: 342 ms
    // |                                                 Min Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |       2 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |       2 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 295.624 |     ms |
    // |                                        90th percentile latency | query-string-on-message | 299.426 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 343.345 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 358.427 |     ms |
    // |                                   50th percentile service time | query-string-on-message |  294.41 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 297.427 |     ms |
    // |                                   99th percentile service time | query-string-on-message |  342.32 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 356.455 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |
Block size: 2^19 bytes
File cache baseline: 38 MB
Snapshot restore time: 235 ms
    // |                                                 Min Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                Mean Throughput | query-string-on-message |       2 |  ops/s |
    // |                                              Median Throughput | query-string-on-message |       2 |  ops/s |
    // |                                                 Max Throughput | query-string-on-message |       2 |  ops/s |
    // |                                        50th percentile latency | query-string-on-message | 327.288 |     ms |
    // |                                        90th percentile latency | query-string-on-message |  338.82 |     ms |
    // |                                        99th percentile latency | query-string-on-message | 377.653 |     ms |
    // |                                       100th percentile latency | query-string-on-message | 387.042 |     ms |
    // |                                   50th percentile service time | query-string-on-message | 326.218 |     ms |
    // |                                   90th percentile service time | query-string-on-message | 337.726 |     ms |
    // |                                   99th percentile service time | query-string-on-message | 376.914 |     ms |
    // |                                  100th percentile service time | query-string-on-message | 385.839 |     ms |
    // |                                                     error rate | query-string-on-message |       0 |      % |
@bugmakerrrrrr
Copy link
Contributor

One question, if we change the block size of an existing file_cache, how do we handle the old blocks with a different block size? clear them all and repopulate the cache or split/combine the old blocks into new blocks?

@andrross andrross changed the title [Feature Request] <Make remote snapshot local file_cache block size configurable> [Feature Request] Make remote snapshot local file_cache block size configurable Jul 29, 2024
@getsaurabh02
Copy link
Member

@finnegancarroll are we proposing this setting to be static or dynamic?

@jed326
Copy link
Collaborator

jed326 commented Aug 1, 2024

Thanks @finnegancarroll this is an interesting feature request. I'm curious about this part:

Particularly in the case of 'metadata' blocks 8MB is a lot and so the baseline disk usage of our caches can be drastically reduced with a more conservative block size.

Have you done any measurements on how much baseline cache usage can be reduced with different block sizes? And on smaller block sizes would there be additional data that has to be redownloaded each time?

Coming at it from a different perspective, if the problem we are trying to solve is reducing the baseline cache usage then is it viable to introduce some custom logic for handling the metadata blocks instead?

@lukas-vlcek
Copy link
Contributor

Hi @finnegancarroll,
I have a noob question if you don't mind, do you think you can elaborate a bit more on the following?

For each segment Lucene opens and holds onto file references to metadata. Lucene never closes these file references so the blocks must remain downloaded and present in our cache for the lifetime of the program.

I am interested in learning more details about this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Searchable Snapshots
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants