-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve include
/exclude
clause list speed and scalability
#11188
Conversation
…of clauses fail. The underlying automaton-backed implementation throws an error if there are too many states. In these circumstances this fix falls-back to using an implementation based on Set lookups for the excluded terms. If the global-ordinals execution mode is in effect this implementation also addresses the slowness identified in issue 11181 which is caused by traversing the TermsEnum - instead the excluded terms’ global ordinals are looked up individually and unset the bits of acceptable terms. This is significantly faster. Closes #11176
// so we let it try and fall-back to Set based lookups if too | ||
// complex. | ||
return new TermListBackedStringFilter(includeValues, excludeValues); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering that maybe we should directly use TermListBackedStringFilter in that case: I wouldn't expect automatons to speed things up a lot and having TermListBackedStringFilter only in this catch block makes it hard to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even it we remove that logic, it would be good though to keep the explanation that we don't use an automaton for the term list case in order not to blow up the maximum number of states.
Just left one question. Otherwise it looks good to me. |
…xclude clauses that include regexes. Otherwise the TermListBackedStringFilter is used for string-based terms
Thanks for review. Changed to only use AutomatonBackedStringFilter for regex cases. Performance still good. |
LGTM |
Pushed to master via caf7235 |
include
/exclude
clause list speed and scalability
In the terms aggregations' IncludeExclude class the underlying automaton-backed implementation throws an error if there are too many states.
In these circumstances this fix falls-back to using an implementation based on Set lookups for the excluded terms.
If the global-ordinals execution mode is in effect this implementation also addresses the slowness identified in issue 11181 which is caused by traversing the TermsEnum - instead the excluded terms’ global ordinals are looked up individually and unset the bits of acceptable terms. This is significantly faster for high cardinality fields.
Closes #11176