-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269
Comments
Thanks @rishabhmaurya seems like something we should do, am removing untriaged. |
@mch2 I would like to work on this. I am new to contributing to OpenSearch. So, would appreciate any guidance. |
Thanks @maitreya2954 for volunteering, assigned it to you! |
@maitreya2954 Appreciate you picking this up. Here is what we can do -
If you're blocked don't hesitate to ask for pointers. Thank you. |
@rishabhmaurya I added execution hint field to the cardinality agg. However, I am stuck at adding unit tests.
3 cases being: Here's my question: What type of query should I test after setting the execution hint. I can see many different type of queries being tested here. Should I just do |
@maitreya2954 That's great that you were able to add execution hint.
yes.
You can test for both match_all query and a filter using term query on a field other than cardinality aggregation field. |
@maitreya2954 how were you thinking about checking which collector is picked from tests? One way could add a method to this class to get the current collector picked for a given segment and add a logic here to store the name/class - OpenSearch/test/framework/src/main/java/org/opensearch/search/aggregations/AggregatorTestCase.java Line 1207 in 978d14e
|
@rishabhmaurya Here's the PR: #15764 Documentation PR: opensearch-project/documentation-website#8265 |
@maitreya2954 great, thanks! I will take a look shortly. Meanwhile you can work on fixing gradle checks which are failing. |
Is your feature request related to a problem? Please describe
Currently the logic to pick
Ordinals
vsDirectCollector
is dynamic and based on the number of ordinals to collect and memory overhead. If memory overhead is high forOrdinalsCollector
, thenDirectCollector
is used. Due to https://issues.apache.org/jira/browse/LUCENE-9663 few users are reporting regression in Cardinality aggregation because of slowerDirectCollector
after replacing prefix compression with LZ4 compression for terms dictionary in lucene 8.9.Fix proposed here is to use OrdinalsCollector more often which will collect the ordinals into a bitset first and then performs term lookup in
postCollect()
of segment and that's a lot faster.However, we don't have a way to control picking up
OrdinalsCollector
in OpenSearch.Describe the solution you'd like
Introduce a memory threshold dynamic setting which OrdinalsCollector can use and if its usage is under this threshold, always use OrdinalsCollector. This logic can be added here.
Related component
Search:Aggregations
Describe alternatives you've considered
Use of eager_global_ordinals or murmur hash, but its an index time setting and can have impact on indexing performance. Its discussed in more detail here.
Additional context
No response
The text was updated successfully, but these errors were encountered: