Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

Closed
rtiruveedulaarkin opened this issue Apr 8, 2022 · 17 comments
Closed
Assignees
Labels
bug Something isn't working Search:Aggregations

Comments

@rtiruveedulaarkin
Copy link

Is your feature request related to a problem? Please describe.
We are planning to upgrade from Elasticsearch 7.7.1 to OpenSearch 1.2.4 release. We have compared the performance of OpenSearch 1.2.4 with Elasticsearch 7.7.1. For cardinality queries (with keyword fields), the performance is degraded by 50%. So we couldn't upgrade to OpenSearch.

The performance degradation is observed from in OpenSearch 1.1.0 release onwards.

Below is the code snippet which is running slow in OpenSearch 1.1.0.

    // with opensearch 1.0.1: 240 requests/second
    // with opensearch 1.1.0:  97 requests/second
    public static SearchRequestBuilder getSearchRequest1(TransportClient client, String index, String randomValue) {
        QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("__id.keyword", randomValue));
        CardinalityAggregationBuilder agg = AggregationBuilders
                .cardinality("somename")
                .field("__id.keyword");
        return client.prepareSearch(index).setQuery(qb).addAggregation(agg);
    }

This degradation is caused due to lucene upgrade from 8.8.2 to 8.9.0. The commit is e153629

Lucene developer said this is caused due to an enhancement in Lucene and it has to be fixed in OpenSearch.

Lucene ticket that I have filed: https://issues.apache.org/jira/browse/LUCENE-10509

Describe the solution you'd like

As per Lucene developer (Adrien Grand):

The cardinality aggregation performs value lookups on each document. OpenSearch should change the way cardinality aggregations run to collect matching ordinals into a bitset first, and only look up values once the entire segment has been collected. This should address the performance problem and will likely make the cardinality aggregation faster than it was before Lucene 8.9.

Describe alternatives you've considered
None

Additional context

@rtiruveedulaarkin rtiruveedulaarkin added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 8, 2022
@ryanbogan ryanbogan added bug Something isn't working Search:Aggregations and removed enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 12, 2022
@andrross
Copy link
Member

@rtiruveedulaarkin Have you tested against OpenSearch 1.2 that uses Lucene 8.10.1? We had a similar issue recently (#1647) that was mitigated in Lucene 8.10.1. The Lucene issue you linked doesn't show any tests against 8.10.1, so I'm curious if LUCENE-9917 also mitigates the regression seen here.

@rtiruveedulaarkin
Copy link
Author

@andrross I have tested against OpenSearch 1.2.4 and 1.3.1. Both are using lucene 8.10.1. The performance issue is present in both these releases.

@andrross
Copy link
Member

@rtiruveedulaarkin And can you confirm that you're using the default BEST_SPEED codec in your tests? It is expected that BEST_COMPRESSION codec can be slower with the newer versions of Lucene.

@rtiruveedulaarkin
Copy link
Author

@andrross I haven't specified codec while creating index. So the index must be using the default best_speed.

@rtiruveedulaarkin rtiruveedulaarkin changed the title Performance degradation in OpenSearch 1.1.0 due to Lucene 8.8.2 Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 Apr 13, 2022
@andrross
Copy link
Member

Thanks @rtiruveedulaarkin

We'll dig into this soon to reproduce it. The mitigation suggested in the Lucene ticket seems like a good optimization to implement in any case.

@rishabhmaurya
Copy link
Contributor

@rtiruveedulaarkin can you set profile as true and post the following debug info for CardinalityAggregator -

public void collectDebugInfo(BiConsumer<String, Object> add) {
    super.collectDebugInfo(add);
    add.accept("empty_collectors_used", emptyCollectorsUsed);
    add.accept("numeric_collectors_used", numericCollectorsUsed);
    add.accept("ordinals_collectors_used", ordinalsCollectorsUsed);
    add.accept("ordinals_collectors_overhead_too_high", ordinalsCollectorsOverheadTooHigh);
    add.accept("string_hashing_collectors_used", stringHashingCollectorsUsed);
}

@rtiruveedulaarkin
Copy link
Author

@rishabhmaurya This is the debug info for CardinalityAggregator.

  "debug": {
      "empty_collectors_used": 0,
      "numeric_collectors_used": 0,
      "ordinals_collectors_overhead_too_high": 1,
      "ordinals_collectors_used": 0,
      "string_hashing_collectors_used": 1
  },

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented May 4, 2022

@rtiruveedulaarkin In your case DirectCollector is getting used, which is pretty inefficient as described in LUCENE-10509. Do you have a way to override the code here, to enforce the use of OrdinalsCollector, just to compare the performance? It would lead to more memory usage though. You can try increasing the precision to 4000 to enforce OrdinalsCollector.

@adnapibar adnapibar assigned rishabhmaurya and unassigned adnapibar May 4, 2022
@rtiruveedulaarkin
Copy link
Author

I have tried with the following query. But OrdinalsCollector is still not used.

rtiruveedula@rtiruveedul-a02 ~ % cat opensearch_debug.json
{
  "profile": true,
  "aggs": {
    "somename": {
      "cardinality": {
        "field": "__id.keyword",
        "precision_threshold": 4000
      }
    }
  },
  "size": 1
}
rtiruveedula@rtiruveedul-a02 ~ % http :9200/index1/_search < opensearch_debug.json | jq '.profile.shards[0].aggregations[0].debug'
{
  "ordinals_collectors_used": 0,
  "ordinals_collectors_overhead_too_high": 1,
  "string_hashing_collectors_used": 1,
  "numeric_collectors_used": 0,
  "empty_collectors_used": 0
}
rtiruveedula@rtiruveedul-a02 ~ %

@rishabhmaurya
Copy link
Contributor

do you mind overriding the code to use OrdinalsCollector. This would give us a confirmation that its caused due to value lookup per document and not any other underlying issue.

@rtiruveedulaarkin
Copy link
Author

I made changes in the code to use OrdinalsCollector. But the performance is the same.

rtiruveedula@rtiruveedul-a02 OpenSearch % git diff
diff --git a/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java b/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
index 61cb08ba28a..1d39aef922e 100644
--- a/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
+++ b/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
@@ -128,7 +128,7 @@ public class CardinalityAggregator extends NumericMetricsAggregator.SingleValue
             final long ordinalsMemoryUsage = OrdinalsCollector.memoryOverhead(maxOrd);
             final long countsMemoryUsage = HyperLogLogPlusPlus.memoryUsage(precision);
             // only use ordinals if they don't increase memory usage by more than 25%
-            if (ordinalsMemoryUsage < countsMemoryUsage / 4) {
+            if (1 != 0) {
                 ordinalsCollectorsUsed++;
                 return new OrdinalsCollector(counts, ordinalValues, context.bigArrays());
             }
rtiruveedula@rtiruveedul-a02 OpenSearch %

rtiruveedula@rtiruveedul-a02 ~ % cat opensearch_debug.json
{
  "profile": true,
  "aggs": {
    "somename": {
      "cardinality": {
        "field": "__id.keyword",
        "precision_threshold": 4000
      }
    }
  },
  "size": 1
}
rtiruveedula@rtiruveedul-a02 ~ %  http :9200/index1/_search < opensearch_debug.json | jq '.profile.shards[0].aggregations[0].debug'
{
  "ordinals_collectors_used": 1,
  "ordinals_collectors_overhead_too_high": 0,
  "string_hashing_collectors_used": 0,
  "numeric_collectors_used": 0,
  "empty_collectors_used": 0
}

@rishabhmaurya
Copy link
Contributor

thanks for confirming, I will try to reproduce and dig deeper into it.

@rishabhmaurya
Copy link
Contributor

can you try enabling eager_global_ordinals and compare performance, if its something acceptable for your use case.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented May 23, 2022

I did a quick experiment using this script and found that using execution hint as map could be atleast 10x slower for low cardinality cases, and >15x slowed for high cardinality cases (script for high cardinality)-
and when eager global ordinals are enabled, map implementation is 45x slower.
Also, on comparing the high cardinality case between opensearch v1.0.0 and 1.3.2, 1.3.2 is 6x slower when maps are used, whereas there is no performance impact when ordinals are used as expected.

version: "1.3.2"

Low cardinality field - 
----------------------
with map
--------
{'took': 1523, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'red', 'doc_count': 4061715}, {'key': 'yellow', 'doc_count': 4061614}, {'key': 'green', 'doc_count': 4046868}, {'key': 'blue', 'doc_count': 4023941}, {'key': 'black', 'doc_count': 4005862}]}}}
0:00:01.526411


with ordinals
-------------
{'took': 107, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'blue', 'doc_count': 4075451}, {'key': 'red', 'doc_count': 4056463}, {'key': 'green', 'doc_count': 4030304}, {'key': 'black', 'doc_count': 4024345}, {'key': 'yellow', 'doc_count': 4013437}]}}}
0:00:00.111376



High cardinality field
----------------------
with map
--------
{'took': 13058, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 're
lation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 240, 'sum_other_doc_count': 19
798751, 'buckets': [{'key': 'black25246', 'doc_count': 129}, {'key': 'red58997', 'doc_count': 129}, {'key': 'black47002', 'doc_count': 125}, {'
key': 'blue51194', 'doc_count': 125}, {'key': 'blue31304', 'doc_count': 124}, {'key': 'blue59033', 'doc_count': 124}, {'key': 'red93007', 'doc_
count': 124}, {'key': 'black33254', 'doc_count': 123}, {'key': 'black84188', 'doc_count': 123}, {'key': 'blue26323', 'doc_count': 123}]}}}
0:00:13.073847


with ordinals 
-------------
{'took': 863, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 244, 'sum_other_doc_count': 20198728, 'buckets': [{'key': 'green39610', 'doc_count': 129}, {'key': 'yellow12715', 'doc_count': 129}, {'key': 'green49960', 'doc_count': 128}, {'key': 'black34451', 'doc_count': 127}, {'key': 'blue21828', 'doc_count': 127}, {'key': 'red1659', 'doc_count': 127}, {'key': 'red42873', 'doc_count': 127}, {'key': 'black62989', 'doc_count': 126}, {'key': 'blue51538', 'doc_count': 126}, {'key': 'green21326', 'doc_count': 126}]}}}
0:00:00.867180



With eager_global_ordinals - 
--------------------------
with map
--------
{'took': 13225, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 244, 'sum_other_doc_count': 20198724, 'buckets': [{'key': 'yellow49230', 'doc_count': 130}, {'key': 'black67354', 'doc_count': 128}, {'key': 'blue66964', 'doc_count': 128}, {'key': 'green77280', 'doc_count': 128}, {'key': 'yellow44885', 'doc_count': 128}, {'key': 'black60109', 'doc_count': 127}, {'key': 'blue12784', 'doc_count': 127}, {'key': 'green32715', 'doc_count': 127}, {'key': 'yellow71316', 'doc_count': 127}, {'key': 'black73015', 'doc_count': 126}]}}}
0:00:13.229019


with ordinals
-------------
{'took': 289, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'rela
tion': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 245, 'sum_other_doc_count': 2019
8733, 'buckets': [{'key': 'black8584', 'doc_count': 128}, {'key': 'blue71747', 'doc_count': 128}, {'key': 'red95343', 'doc_count': 128}, {'key'
: 'black99004', 'doc_count': 127}, {'key': 'black65579', 'doc_count': 126}, {'key': 'black84931', 'doc_count': 126}, {'key': 'black94373', 'doc
_count': 126}, {'key': 'blue84922', 'doc_count': 126}, {'key': 'green39849', 'doc_count': 126}, {'key': 'green54011', 'doc_count': 126}]}}}
0:00:00.293075

@rishabhmaurya
Copy link
Contributor

Using murmur plugin to precompute hashes and using hash fields for cardinality aggregation significantly helped one user. I'll ask them to post more update on this thread.

@kcharikrish
Copy link

kcharikrish commented Aug 3, 2022

We were able to achieve performance improvement with the use of murmur plugin to precompute hash value of a field in an index.

Scenario: We are migrating from ElasticSearch v7.10 to OpenSearch v1.2. Post migration, we observed that a search query with Cardinality Aggregation was having performance degradation of 186%. Sample query is given below.

GET sample-index-20220721/_search?timeout=10m { "aggregations": { "user_q": { "aggregations": { "account_name_total": { "cardinality": { "field": "account_name" } } }, "terms": { "field": "user", "size": 10000 } } }, "query": { "query_string": { "query": "timestamp:[1657828498000||/m TO 1657914898000||/m] AND storage_id:302391 AND type:1", "allow_leading_wildcard": false } }, "size": 0, "track_total_hits": true }

Steps Performed for Triage with Observations:
We executed the long running query in asynchronous mode with “Profile API” option and observed the following points

  1. Identified that “account_name” field is a type “keyword” and has high cardinality
  2. The Aggregations profile API showed that the “CardinalityAggregator” type was taking more time than expected across multiple shards, within the index where the data was located
  3. Under the debug section of the aggregation, we were able to see that no global ordinals were used (see ordinals_collections_used=0 and ordinals_collectors_overhead_too_high=1).
    "debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 1, "string_hashing_collectors_used" : 1, "numeric_collectors_used" : 0, "empty_collectors_used" : 0 }
  4. As per the recommendation from this article, if a string field has high cardinality, it is recommended to store the hash value of the field in the index. This can be done using the murmur plugin
  5. We also observed that, the query execution takes times only on indices which has data matching the query

Performance Tuning changes:
Based on the observation, we performed the following steps

  1. Created a new index “sample-index-20220721-updated” with the following two changes
    a. Used murmur plugin to store hash value of the field in the index for “account_name” field. Follow the steps in this link
    b. Enabled “eager_global_ordinals”: true for “user” field. We did this as we are using user field under “terms” within the query
  2. Re-indexed the source index “sample-index-20220721” to “sample-index-20220721-updated” for the changes to take effect.

Once the above changes were implemented, we were able to see good decrease in the execution time of the query. Though the response time didn't fall back to the pre Lucene 8.9.0 levels with murmur plugin, but this solution gives good response times in OpenSearch v1.2

Updated Debug section for the Index in the Profile API output:

"debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 0, "string_hashing_collectors_used" : 0, "numeric_collectors_used" : 34, "empty_collectors_used" : 0 }

@rishabhmaurya
Copy link
Contributor

Thanks @kcharikrish for working on it and posting the analysis. Conclusion so far is to use the right index configuration and adjust query params to avoid performance degradation. Closing this thread, feel free to open if above suggested configuration doesn't work for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Aggregations
Projects
None yet
Development

No branches or pull requests

7 participants