You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A terms agg with an exclude arrray of medium size (in my case, 86 strings) was sufficient to cause this error:
Caused by: org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:743)
at org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:138)
at org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32)
at org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:27)
at org.elasticsearch.search.aggregations.bucket.terms.support.IncludeExclude$StringFilter.<init>(IncludeExclude.java:88)
I can see how consulting large sets of terms can be expensive and we might want to cap it but in the above case this risk is mitigated through the use of the "sampler" agg to consider a maxiumum of 100 top-matching docs.
The use case for this type of query is looking for new terms outside of a set that has already been gathered by the client eg aiding graph exploration by looking for connections beyond what you already have collected:
These sorts of exclude lists can grow large so a cap of around 85 terms seems low for this use case.
The text was updated successfully, but these errors were encountered:
We currently handle lists of terms as an automaton just like we handle regular expressions for simplicity. I guess we could specialize the terms list case if we want large lists to work.
Per @jpountz , this error is only affecting the master branch. Just want to confirm that this will not get introduced to 1.6 when it gets released for we already have users using the exclude exact value with large number of values :)
A
terms
agg with anexclude
arrray of medium size (in my case, 86 strings) was sufficient to cause this error:Example query:
I can see how consulting large sets of terms can be expensive and we might want to cap it but in the above case this risk is mitigated through the use of the "sampler" agg to consider a maxiumum of 100 top-matching docs.
The use case for this type of query is looking for new terms outside of a set that has already been gathered by the client eg aiding graph exploration by looking for connections beyond what you already have collected:
These sorts of exclude lists can grow large so a cap of around 85 terms seems low for this use case.
The text was updated successfully, but these errors were encountered: