Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve include/exclude clause list speed and scalability #11188

Closed
wants to merge 2 commits into from
Closed

Improve include/exclude clause list speed and scalability #11188

wants to merge 2 commits into from

Conversation

markharwood
Copy link
Contributor

In the terms aggregations' IncludeExclude class the underlying automaton-backed implementation throws an error if there are too many states.
In these circumstances this fix falls-back to using an implementation based on Set lookups for the excluded terms.
If the global-ordinals execution mode is in effect this implementation also addresses the slowness identified in issue 11181 which is caused by traversing the TermsEnum - instead the excluded terms’ global ordinals are looked up individually and unset the bits of acceptable terms. This is significantly faster for high cardinality fields.

Closes #11176

…of clauses fail.

The underlying automaton-backed implementation throws an error if there are too many states.

In these circumstances this fix falls-back to using an implementation based on Set lookups for the excluded terms.
If the global-ordinals execution mode is in effect this implementation also addresses the slowness identified in issue 11181 which is caused by traversing the TermsEnum - instead the excluded terms’ global ordinals are looked up individually and unset the bits of acceptable terms. This is significantly faster.

Closes #11176
@markharwood
Copy link
Contributor Author

@jpountz This PR addresses both #11181 and #11176

// so we let it try and fall-back to Set based lookups if too
// complex.
return new TermListBackedStringFilter(includeValues, excludeValues);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering that maybe we should directly use TermListBackedStringFilter in that case: I wouldn't expect automatons to speed things up a lot and having TermListBackedStringFilter only in this catch block makes it hard to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even it we remove that logic, it would be good though to keep the explanation that we don't use an automaton for the term list case in order not to blow up the maximum number of states.

@jpountz
Copy link
Contributor

jpountz commented May 15, 2015

Just left one question. Otherwise it looks good to me.

…xclude clauses that include regexes.

Otherwise the TermListBackedStringFilter is used for string-based terms
@markharwood
Copy link
Contributor Author

Thanks for review. Changed to only use AutomatonBackedStringFilter for regex cases. Performance still good.

@jpountz
Copy link
Contributor

jpountz commented May 15, 2015

LGTM

@markharwood
Copy link
Contributor Author

Pushed to master via caf7235

@kevinkluge kevinkluge removed the review label May 15, 2015
@clintongormley clintongormley changed the title Aggregations improvement: include/exclude clause list speed and scalability Improve include/exclude clause list speed and scalability Jun 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terms agg fails with medium-large "exclude" clause lists
4 participants