-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inner_hits are far not scalable out of the box #32818
Comments
If you have big documents then retrieving the |
Pinging @elastic/es-search-aggs |
NO! Not only faster, Scalable! I need all source of matched nested documents, my statement is: "whatever the default behavior is - it should not be non-scalable", do you agree with it? I have docs with 50 nested objects and it works fine, but with 250 nested objects performance goes down ~10x (from 20ms to 200ms approx). So it is like y = 2 * x linear dependency, elastic is intended to be used in highly loaded, multi-user environments so this linear algo seem to be awful. I don't see any sense of keeping default behavior which obviously can wreck your production's thoughoutput at some point. In my case I also have another level of nested objects under 1st level of nested objects, such things can't be retrieved by docvalues Is it possible to store _source for each nested document? That would resolve scalability issue Btw, currently I decided to store whole json inside each nested document as a string in order to get it using docvalues |
No need to yell ! I think there are ways to make it faster without changing how it is stored internally. For instance we parse the whole |
@jimczi Can you give tips where I need to look at in the code? |
You can start by looking at elasticsearch/server/src/main/java/org/elasticsearch/search/fetch/FetchPhase.java Line 288 in 15ff3da
|
Previously if an inner_hits block required _ source, we would reload and parse the root document's source for every hit. This PR adds a shared SourceLookup to the inner hits context that allows inner hits to reuse parsed source if it's already available. This matches our approach for sharing the root document ID. Relates to #32818.
I merged #60494, which avoids reloading and reparsing the root document's _source for every inner hit. I suspect this would make a significant positive difference in your case. I'm going to close this issue for now, but please open a new one if you continue to see poor performance. Note there's already another issue about a high |
Previously if an inner_hits block required _ source, we would reload and parse the root document's source for every hit. This PR adds a shared SourceLookup to the inner hits context that allows inner hits to reuse parsed source if it's already available. This matches our approach for sharing the root document ID. Relates to elastic#32818.
Previously if an inner_hits block required _ source, we would reload and parse the root document's source for every hit. This PR adds a shared SourceLookup to the inner hits context that allows inner hits to reuse parsed source if it's already available. This matches our approach for sharing the root document ID. Relates to #32818.
During development of search for my project I found out that specific tuning is needed in order to make inner_hits usable. Why do we have at all that parent _source parsing for inner_hits if it is clear that it is not scalable at all?
If each nested object is a separate document why not to store it's own _source for each? (if possible)
Elasticsearch version (
bin/elasticsearch --version
):5.6.4
Plugins installed: []
JVM version (
java -version
):java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
OS version (
uname -a
if on a Unix-like system):Linux new-es-cluster-test-master 4.13.0-1008-gcp #11-Ubuntu SMP Thu Jan 25 11:08:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
current behavior: inner_hits not usable without using _source: false and doc_values (or stored fields)
expected behavior: inner_hits usable & scalable out of the box
Steps to reproduce:
The text was updated successfully, but these errors were encountered: