Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch inner_hits is very slow #56210

Closed
CSharpBender opened this issue May 5, 2020 · 3 comments
Closed

Elasticsearch inner_hits is very slow #56210

CSharpBender opened this issue May 5, 2020 · 3 comments
Labels
feedback_needed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@CSharpBender
Copy link

CSharpBender commented May 5, 2020

Elasticsearch version (bin/elasticsearch --version):
Elasticsearch 7.6.1
Plugins installed: []
Nest 7.6.1
JVM version (java -version):
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13.0.2+8, mixed mode, sharing)
OS version (uname -a if on a Unix-like system):
Linux c87e626860d1 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I have about 1 milion documents with up to ~100 nested documents each, containing just a few boolean fields and a numeric key. The nested documents specify if the parent document can be used in a specific scenario.
I always need just one nested document, so I made use of inner_hits to include in the response only the required nested document (1 out of 100).
I was expecting a performance improvement (less data, traffic, processing, etc) but the execution time increased with at least 100ms. Basically it doubles the execution time.
Even when I include a single property or none the request executes in the same amount of time.
Without the inner_hits it's very fast but I'm getting all the nested documents and I need to filter them in c#.

Steps to reproduce:

PUT test
{
  "mappings": {
    "properties": {
      "my_nested": {
        "type": "nested",
        "properties": {
          "regionId": {
            "type": "integer"
          }
        }
      }
    }
  }
}

//create relevant number of documents
PUT test/_doc/1?refresh
{
  "title": "Parent document",
  "my_nested": [
    {
      "regionId": "1",
      "licensed": true
    },
    {
      "regionId": "2",
      "licensed": false
    },
    {
      "regionId": "3",
      "licensed": true
    }
  ]
}

POST test/_search
{
  "query": {
    "nested": {
      "path": "my_nested",
      "query": {
        "match": {
          "my_nested.regionId": "1"
        }
      },
      "inner_hits": {
        "_source": false,
        "docvalue_fields": [
          "my_nested.licensed"
        ]
      }
    }
  },
  "_source": {
    "includes": [
      "title"
    ]
  }
}

Provide logs (if relevant):

Some metrics using my real data:

  • Disabled cache and included all inner hits, took: 205
"inner_hits": {  }
  • Disabled cache and included one field from the inner_hits, took: 206
"inner_hits": {
        "_source": false,
        "docvalue_fields": [
          "my_nested.licensed"
        ]
      }
  • Disabled cache and "inner_hits": { "_source":false }, took: 195
  • Disabled cache and removed "inner_hits": {}, took: 86
  • Cached query, second run without inner_hits, took: 27
  • Cached query, second run with "inner_hits": {}, took: 147
    You can notice a constant 120ms overhead when including inner_hits, which is not affected by the cache usage, which is huge compared to the 86ms needed for the whole query to execute.
@jtibshirani jtibshirani added the :Search/Search Search-related issues that do not fall into other categories label May 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@ywelsch
Copy link
Contributor

ywelsch commented May 17, 2022

@CSharpBender Is this still an issue in newer Elasticsearch versions (>= 7.17)? We have done some changes regarding inner hits loading (e.g. #60179) and would like to understand if this is still an issue. I've done some basic benchmarking mimicking the above scenario and couldn't reproduce the issue. If you still see the issue, could you run hot_threads in parallel to running the query to help us figure out where the extra time is spent in your cluster?

@CSharpBender
Copy link
Author

It's a 2 year question, I forgot about it.
Please use the reproducing steps and check if there is any change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback_needed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants