Move away from file watcher for releasing memory #1885

jmazanec15 · 2024-07-25T17:20:47Z

Description

Right now, we free memory for segments whose files get deleted by having a resource watcher watch for those files:

Init of resource watcher - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/plugin/KNNPlugin.java#L196
File change watcher initialization - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/memory/NativeMemoryLoadStrategy.java#L84
File watcher - https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/memory/NativeMemoryLoadStrategy.java#L97

This file watching can actually be completely removed and it would do some good in simplifying the code (less static initialization, decoupling between cache manager and native memory load strategy, moving away from tight coupling of FSDirectory, etc). And additionally, how we do it now is a bug - see #1012.

To fix this, we just need to ensure that for either our DocValuesProducer and our KnnVectorsReader, when close is called, we evict any indices from memory (similar to how we do it now with the filewatcher). We would actually need to implement a really light weight DocValuesProducer that delegates for everything.

Related issues

#1012

luyuncheng · 2024-07-26T03:15:48Z

hi @jmazanec15 i am writing a DocValuesProducer for KNN80Codec for save store usage. i like this issues very much. it can release memory efficiently. i can help contribute my DocValuesProducer if you like

heemin32 · 2024-07-26T04:04:12Z

Is the lifecycle of native memory(native knn index) same as the lifecycle of DocValuesProducer or KnnVectorsReader?
I thought the native memory stays in memory until either we clear cache or the index is unusable(delete or close).

Does close is called for either DocValuesProducer or KnnVectorsReader when the index is closed or deleted?

jmazanec15 · 2024-07-26T15:45:22Z

hi @jmazanec15 i am writing a DocValuesProducer for KNN80Codec for save store usage. i like this issues very much. it can release memory efficiently. i can help contribute my DocValuesProducer if you like

Awesome! What custom functionality did you add in DVProducer to reduce storage? I think the change for this issue would be quite small so I do not see it conflicting.

Is the lifecycle of native memory(native knn index) same as the lifecycle of DocValuesProducer or KnnVectorsReader?

DocValuesProducer or KnnVectorsReader are both per segment. They two producer/reader classes are opened (or shared) in https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentReader.java ctor.

On close, the close methods will eventually be called:

This reader will be required in the search path: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L206

luyuncheng · 2024-07-26T16:09:15Z

Awesome! What custom functionality did you add in DVProducer to reduce storage? I think the change for this issue would be quite small so I do not see it conflicting.

@jmazanec15

in #1571 we can save the storage for _source store field.

and i think we can save the docValues storage like #1571 (comment)

we create 2 types docValues in Lucene engine. also we can get docValues from native engines, in faiss we can use reconstruct_n function to get i'th vector values from flat storage which would keep it in memory after it opens

heemin32 · 2024-07-26T16:22:51Z

@jmazanec15 Right. However, does producer/reader classes are singleton and stay open until index is closed or deleted?
Otherwise, the close method might get called even if we don't want the native memory cache to be evicted. Or, do we want to reload the cache when new producer/reader classes are created?

jmazanec15 · 2024-07-26T17:43:38Z

@luyuncheng thats interesting. I think @navneet1v may have been thinking around that. We are in process of migrating away from Custom Doc Values to KNNVectorFormat. See #1855. Would you be able to do it with the KNNVectorReader instead of doc values?

jmazanec15 · 2024-07-26T17:49:28Z

@heemin32 Oh I think I see what you are saying. Will there be multiple readers/producers per segment/search. I think the purpose of SegmentCoreReaders is to ensure that they can be shared.

Worth noting that for lucene, the input is closed as well (see https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L325).

heemin32 · 2024-07-26T18:30:23Z

@heemin32 Oh I think I see what you are saying. Will there be multiple readers/producers per segment/search. I think the purpose of SegmentCoreReaders is to ensure that they can be shared.

Sharing is good but one other thing is whether the reader get closed and opened repetitively regardless the existence of segment file.

jmazanec15 · 2024-07-26T19:15:50Z

closed and opened repetitively regardless the existence of segment file

Im not sure why this would happen. Need to be sure though before changing.

heemin32 · 2024-07-26T20:13:53Z

I am don't know how it works internally but based on this comment, it seems like multiple class of SegmentReader can be created. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentReader.java#L128

/**
   * Create new SegmentReader sharing core from a previous SegmentReader and using the provided
   * liveDocs, and recording whether those liveDocs were carried in ram (isNRT=true).
   */

jmazanec15 · 2024-07-26T20:38:52Z

I thin that is where https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java#L172-L179 comes in. Along with https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentDocValues.java.

navneet1v · 2024-07-26T20:46:48Z

@jmazanec15 thanks for opening up the issue and I see already some great ideas on the gh issue. Here is my true sense here:

@luyuncheng I would like to really see how we can use the vectors present in Faiss index rather than having a separate file in segment for flat vectors storage. Also how much will be the latency in mapping those vectors back to heap for say exact search. As @jmazanec15 already mentioned we are in process of moving to KNNVectorFormat by 2.17(RFC: [RFC] Integrating KNNVectorsFormat in Native Vector Search Engine #1853). So please have a look in that Reader code. The skeleton has been added by this PR: Reuse KNNVectorFieldData for reduce disk usage #1571 , more I will be adding in coming weeks.
@heemin32 on the idea of will there be multiple instance of DVP or not, I think its no. Lucene maintains a readerPool at its end to ensure that it hands out same reader whoever asked for it, check code here, here. But lets say if its not true, then we can handle this case easily via refCount. We can do a small experiment to validate, since the code of Lucene gives mix signal.
Now on opening and closing of DVP multiple times the ans is again no. A DVP gets open in case of segment creation and then gets closed if the segment is deleted, and a DVP gets opened/closed when you do refresh.

jmazanec15 · 2024-07-28T16:44:03Z

But lets say if its not true, then we can handle this case easily via refCount

Right, refCounting would prevent issues, but make removal a little trickier.

kotwanikunal · 2024-08-12T21:47:54Z

Read through the issue here - refCount would be a solution, but it is not a direct solution for this problem.
Going through the code - it looks like we want the cache manager to process index lifecycle events.

My proposal would be to separate the concerns and instead follow this pattern to handle lifecycle events
(The method names are hypothetical and we can see what events fit best for kNN)

public void onIndexModule(IndexModule module) {
        module.addSettingsUpdateConsumer(INDEX_KNN_ALGO_PARAM_EF_SEARCH_SETTING, newVal -> {
            logger.debug("The value of [KNN] setting [{}] changed to [{}]", KNN_ALGO_PARAM_EF_SEARCH, newVal);
            // TODO: replace cache-rebuild with index reload into the cache
            NativeMemoryCacheManager.getInstance().rebuildCache();
        });
        module.addIndexEventListener(new NativeCacheIndexEventListener());
    }

    static class NativeCacheIndexEventListener implements IndexEventListener {
        @Override
        public void beforeIndexShardClosed(ShardId shardId, IndexShard indexShard, Settings indexSettings) {
             NativeMemoryCacheManager.getInstance().shardClosed();
        }

        @Override
        public void shardRoutingChanged(IndexShard indexShard, ShardRouting oldRouting, ShardRouting newRouting) {
            NativeMemoryCacheManager.getInstance().shardMovedAway();
        }

        @Override
        public void afterIndexRemoved(Index index, IndexSettings indexSettings, IndicesClusterStateService.AllocatedIndices.IndexRemovalReason reason) {
             NativeMemoryCacheManager.getInstance().indexCleanup();
        }
    }

Thoughts?

jmazanec15 · 2024-08-12T22:38:40Z

@kotwanikunal My issue with the index life cycle is that we would also need to keep the file watcher code, which I dont like because its a direct dependency on file system and directory implementation. Thus, if we are able to properly ref-count and free on segment docvalues producer code, then we will be able to solve both issues at one go.

kotwanikunal · 2024-08-12T23:44:01Z

@kotwanikunal My issue with the index life cycle is that we would also need to keep the file watcher code, which I dont like because its a direct dependency on file system and directory implementation. Thus, if we are able to properly ref-count and free on segment docvalues producer code, then we will be able to solve both issues at one go.

I see the problem now. We have to track at the file level and not the shard level. I was wondering if we could simply filter cache entries on receiving the shard lifecycle events and evict them (in a way - skip the file watcher all together) - but the cache operates at a more granular level.

luyuncheng · 2024-08-13T08:55:46Z

I see the problem now. We have to track at the file level and not the shard level. I was wondering if we could simply filter cache entries on receiving the shard lifecycle events and evict them (in a way - skip the file watcher all together) - but the cache operates at a more granular level.

@kotwanikunal when we have shard close, or file remove, segment merged events, it can release memory at producer level calling close. and i think we need refCount like #1885 (comment) and #1885 (comment) says. it is because that when there is an shard events happened before an segment query at concurrent search scenarios, refcount can help avoid memory segment fault.

i think DocValuesProducer is a simple way as the issue says:

We would actually need to implement a really light weight DocValuesProducer that delegates for everything.

0ctopus13prime · 2024-09-19T21:01:51Z

Before cutting out the absolute file path dependency in KNN, this should be resolved first before this issue : #2033

github-actions bot added the untriaged label Jul 25, 2024

jmazanec15 removed the untriaged label Jul 25, 2024

vamshin assigned ryanbogan Jul 25, 2024

luyuncheng mentioned this issue Aug 9, 2024

Add DocValuesProducers for releasing memory when close index #1946

Merged

5 tasks

vamshin assigned luyuncheng Aug 14, 2024

navneet1v added the Enhancements Increases software capabilities beyond original client specifications label Sep 19, 2024

navneet1v assigned 0ctopus13prime Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move away from file watcher for releasing memory #1885

Move away from file watcher for releasing memory #1885

jmazanec15 commented Jul 25, 2024

luyuncheng commented Jul 26, 2024 •

edited

Loading

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

luyuncheng commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

navneet1v commented Jul 26, 2024

jmazanec15 commented Jul 28, 2024

kotwanikunal commented Aug 12, 2024 •

edited

Loading

jmazanec15 commented Aug 12, 2024

kotwanikunal commented Aug 12, 2024

luyuncheng commented Aug 13, 2024

0ctopus13prime commented Sep 19, 2024

Move away from file watcher for releasing memory #1885

Move away from file watcher for releasing memory #1885

Comments

jmazanec15 commented Jul 25, 2024

Description

Related issues

luyuncheng commented Jul 26, 2024 • edited Loading

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

luyuncheng commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

heemin32 commented Jul 26, 2024

jmazanec15 commented Jul 26, 2024

navneet1v commented Jul 26, 2024

jmazanec15 commented Jul 28, 2024

kotwanikunal commented Aug 12, 2024 • edited Loading

jmazanec15 commented Aug 12, 2024

kotwanikunal commented Aug 12, 2024

luyuncheng commented Aug 13, 2024

0ctopus13prime commented Sep 19, 2024

luyuncheng commented Jul 26, 2024 •

edited

Loading

kotwanikunal commented Aug 12, 2024 •

edited

Loading