-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move away from file watcher for releasing memory #1885
Comments
hi @jmazanec15 i am writing a DocValuesProducer for KNN80Codec for save store usage. i like this issues very much. it can release memory efficiently. i can help contribute my DocValuesProducer if you like |
Is the lifecycle of native memory(native knn index) same as the lifecycle of DocValuesProducer or KnnVectorsReader? Does |
Awesome! What custom functionality did you add in DVProducer to reduce storage? I think the change for this issue would be quite small so I do not see it conflicting.
DocValuesProducer or KnnVectorsReader are both per segment. They two producer/reader classes are opened (or shared) in https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentReader.java ctor. On close, the close methods will eventually be called:
This reader will be required in the search path: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/KNNWeight.java#L206 |
in #1571 we can save the storage for and i think we can save the docValues storage like #1571 (comment) we create 2 types docValues in Lucene engine. also we can get docValues from native engines, in faiss we can use |
@jmazanec15 Right. However, does producer/reader classes are singleton and stay open until index is closed or deleted? |
@luyuncheng thats interesting. I think @navneet1v may have been thinking around that. We are in process of migrating away from Custom Doc Values to KNNVectorFormat. See #1855. Would you be able to do it with the KNNVectorReader instead of doc values? |
@heemin32 Oh I think I see what you are saying. Will there be multiple readers/producers per segment/search. I think the purpose of SegmentCoreReaders is to ensure that they can be shared. Worth noting that for lucene, the input is closed as well (see https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L325). |
Sharing is good but one other thing is whether the reader get closed and opened repetitively regardless the existence of segment file. |
Im not sure why this would happen. Need to be sure though before changing. |
I am don't know how it works internally but based on this comment, it seems like multiple class of SegmentReader can be created. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentReader.java#L128
|
@jmazanec15 thanks for opening up the issue and I see already some great ideas on the gh issue. Here is my true sense here:
|
Right, refCounting would prevent issues, but make removal a little trickier. |
Read through the issue here - My proposal would be to separate the concerns and instead follow this pattern to handle lifecycle events
Thoughts? |
@kotwanikunal My issue with the index life cycle is that we would also need to keep the file watcher code, which I dont like because its a direct dependency on file system and directory implementation. Thus, if we are able to properly ref-count and free on segment docvalues producer code, then we will be able to solve both issues at one go. |
I see the problem now. We have to track at the file level and not the shard level. I was wondering if we could simply filter cache entries on receiving the shard lifecycle events and evict them (in a way - skip the file watcher all together) - but the cache operates at a more granular level. |
@kotwanikunal when we have i think
|
Before cutting out the absolute file path dependency in KNN, this should be resolved first before this issue : #2033 |
Description
Right now, we free memory for segments whose files get deleted by having a resource watcher watch for those files:
This file watching can actually be completely removed and it would do some good in simplifying the code (less static initialization, decoupling between cache manager and native memory load strategy, moving away from tight coupling of FSDirectory, etc). And additionally, how we do it now is a bug - see #1012.
To fix this, we just need to ensure that for either our DocValuesProducer and our KnnVectorsReader, when close is called, we evict any indices from memory (similar to how we do it now with the filewatcher). We would actually need to implement a really light weight DocValuesProducer that delegates for everything.
Related issues
#1012
The text was updated successfully, but these errors were encountered: