Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

rtiruveedulaarkin · 2022-04-08T15:32:15Z

Is your feature request related to a problem? Please describe.
We are planning to upgrade from Elasticsearch 7.7.1 to OpenSearch 1.2.4 release. We have compared the performance of OpenSearch 1.2.4 with Elasticsearch 7.7.1. For cardinality queries (with keyword fields), the performance is degraded by 50%. So we couldn't upgrade to OpenSearch.

The performance degradation is observed from in OpenSearch 1.1.0 release onwards.

Below is the code snippet which is running slow in OpenSearch 1.1.0.

    // with opensearch 1.0.1: 240 requests/second
    // with opensearch 1.1.0:  97 requests/second
    public static SearchRequestBuilder getSearchRequest1(TransportClient client, String index, String randomValue) {
        QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("__id.keyword", randomValue));
        CardinalityAggregationBuilder agg = AggregationBuilders
                .cardinality("somename")
                .field("__id.keyword");
        return client.prepareSearch(index).setQuery(qb).addAggregation(agg);
    }

This degradation is caused due to lucene upgrade from 8.8.2 to 8.9.0. The commit is e153629

Lucene developer said this is caused due to an enhancement in Lucene and it has to be fixed in OpenSearch.

Lucene ticket that I have filed: https://issues.apache.org/jira/browse/LUCENE-10509

Describe the solution you'd like

As per Lucene developer (Adrien Grand):

The cardinality aggregation performs value lookups on each document. OpenSearch should change the way cardinality aggregations run to collect matching ordinals into a bitset first, and only look up values once the entire segment has been collected. This should address the performance problem and will likely make the cardinality aggregation faster than it was before Lucene 8.9.

Describe alternatives you've considered
None

Additional context

The text was updated successfully, but these errors were encountered:

andrross · 2022-04-12T18:08:52Z

@rtiruveedulaarkin Have you tested against OpenSearch 1.2 that uses Lucene 8.10.1? We had a similar issue recently (#1647) that was mitigated in Lucene 8.10.1. The Lucene issue you linked doesn't show any tests against 8.10.1, so I'm curious if LUCENE-9917 also mitigates the regression seen here.

rtiruveedulaarkin · 2022-04-13T01:59:49Z

@andrross I have tested against OpenSearch 1.2.4 and 1.3.1. Both are using lucene 8.10.1. The performance issue is present in both these releases.

andrross · 2022-04-13T02:42:46Z

@rtiruveedulaarkin And can you confirm that you're using the default BEST_SPEED codec in your tests? It is expected that BEST_COMPRESSION codec can be slower with the newer versions of Lucene.

rtiruveedulaarkin · 2022-04-13T05:21:18Z

@andrross I haven't specified codec while creating index. So the index must be using the default best_speed.

andrross · 2022-04-15T17:10:30Z

Thanks @rtiruveedulaarkin

We'll dig into this soon to reproduce it. The mitigation suggested in the Lucene ticket seems like a good optimization to implement in any case.

rishabhmaurya · 2022-05-04T01:55:03Z

@rtiruveedulaarkin can you set profile as true and post the following debug info for CardinalityAggregator -

public void collectDebugInfo(BiConsumer<String, Object> add) {
    super.collectDebugInfo(add);
    add.accept("empty_collectors_used", emptyCollectorsUsed);
    add.accept("numeric_collectors_used", numericCollectorsUsed);
    add.accept("ordinals_collectors_used", ordinalsCollectorsUsed);
    add.accept("ordinals_collectors_overhead_too_high", ordinalsCollectorsOverheadTooHigh);
    add.accept("string_hashing_collectors_used", stringHashingCollectorsUsed);
}

rtiruveedulaarkin · 2022-05-04T05:30:13Z

@rishabhmaurya This is the debug info for CardinalityAggregator.

  "debug": {
      "empty_collectors_used": 0,
      "numeric_collectors_used": 0,
      "ordinals_collectors_overhead_too_high": 1,
      "ordinals_collectors_used": 0,
      "string_hashing_collectors_used": 1
  },

rishabhmaurya · 2022-05-04T16:19:20Z

@rtiruveedulaarkin In your case DirectCollector is getting used, which is pretty inefficient as described in LUCENE-10509. Do you have a way to override the code here, to enforce the use of OrdinalsCollector, just to compare the performance? It would lead to more memory usage though. You can try increasing the precision to 4000 to enforce OrdinalsCollector.

rtiruveedulaarkin · 2022-05-05T05:14:54Z

I have tried with the following query. But OrdinalsCollector is still not used.

rtiruveedula@rtiruveedul-a02 ~ % cat opensearch_debug.json
{
  "profile": true,
  "aggs": {
    "somename": {
      "cardinality": {
        "field": "__id.keyword",
        "precision_threshold": 4000
      }
    }
  },
  "size": 1
}
rtiruveedula@rtiruveedul-a02 ~ % http :9200/index1/_search < opensearch_debug.json | jq '.profile.shards[0].aggregations[0].debug'
{
  "ordinals_collectors_used": 0,
  "ordinals_collectors_overhead_too_high": 1,
  "string_hashing_collectors_used": 1,
  "numeric_collectors_used": 0,
  "empty_collectors_used": 0
}
rtiruveedula@rtiruveedul-a02 ~ %

rishabhmaurya · 2022-05-05T16:49:43Z

do you mind overriding the code to use OrdinalsCollector. This would give us a confirmation that its caused due to value lookup per document and not any other underlying issue.

rtiruveedulaarkin · 2022-05-09T07:01:23Z

I made changes in the code to use OrdinalsCollector. But the performance is the same.

rtiruveedula@rtiruveedul-a02 OpenSearch % git diff
diff --git a/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java b/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
index 61cb08ba28a..1d39aef922e 100644
--- a/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
+++ b/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java
@@ -128,7 +128,7 @@ public class CardinalityAggregator extends NumericMetricsAggregator.SingleValue
             final long ordinalsMemoryUsage = OrdinalsCollector.memoryOverhead(maxOrd);
             final long countsMemoryUsage = HyperLogLogPlusPlus.memoryUsage(precision);
             // only use ordinals if they don't increase memory usage by more than 25%
-            if (ordinalsMemoryUsage < countsMemoryUsage / 4) {
+            if (1 != 0) {
                 ordinalsCollectorsUsed++;
                 return new OrdinalsCollector(counts, ordinalValues, context.bigArrays());
             }
rtiruveedula@rtiruveedul-a02 OpenSearch %

rtiruveedula@rtiruveedul-a02 ~ % cat opensearch_debug.json
{
  "profile": true,
  "aggs": {
    "somename": {
      "cardinality": {
        "field": "__id.keyword",
        "precision_threshold": 4000
      }
    }
  },
  "size": 1
}
rtiruveedula@rtiruveedul-a02 ~ %  http :9200/index1/_search < opensearch_debug.json | jq '.profile.shards[0].aggregations[0].debug'
{
  "ordinals_collectors_used": 1,
  "ordinals_collectors_overhead_too_high": 0,
  "string_hashing_collectors_used": 0,
  "numeric_collectors_used": 0,
  "empty_collectors_used": 0
}

rishabhmaurya · 2022-05-09T17:42:12Z

thanks for confirming, I will try to reproduce and dig deeper into it.

rishabhmaurya · 2022-05-10T18:34:56Z

can you try enabling eager_global_ordinals and compare performance, if its something acceptable for your use case.

rishabhmaurya · 2022-05-23T17:22:40Z

I did a quick experiment using this script and found that using execution hint as map could be atleast 10x slower for low cardinality cases, and >15x slowed for high cardinality cases (script for high cardinality)-
and when eager global ordinals are enabled, map implementation is 45x slower.
Also, on comparing the high cardinality case between opensearch v1.0.0 and 1.3.2, 1.3.2 is 6x slower when maps are used, whereas there is no performance impact when ordinals are used as expected.

version: "1.3.2"

Low cardinality field - 
----------------------
with map
--------
{'took': 1523, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'red', 'doc_count': 4061715}, {'key': 'yellow', 'doc_count': 4061614}, {'key': 'green', 'doc_count': 4046868}, {'key': 'blue', 'doc_count': 4023941}, {'key': 'black', 'doc_count': 4005862}]}}}
0:00:01.526411


with ordinals
-------------
{'took': 107, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'blue', 'doc_count': 4075451}, {'key': 'red', 'doc_count': 4056463}, {'key': 'green', 'doc_count': 4030304}, {'key': 'black', 'doc_count': 4024345}, {'key': 'yellow', 'doc_count': 4013437}]}}}
0:00:00.111376



High cardinality field
----------------------
with map
--------
{'took': 13058, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 're
lation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 240, 'sum_other_doc_count': 19
798751, 'buckets': [{'key': 'black25246', 'doc_count': 129}, {'key': 'red58997', 'doc_count': 129}, {'key': 'black47002', 'doc_count': 125}, {'
key': 'blue51194', 'doc_count': 125}, {'key': 'blue31304', 'doc_count': 124}, {'key': 'blue59033', 'doc_count': 124}, {'key': 'red93007', 'doc_
count': 124}, {'key': 'black33254', 'doc_count': 123}, {'key': 'black84188', 'doc_count': 123}, {'key': 'blue26323', 'doc_count': 123}]}}}
0:00:13.073847


with ordinals 
-------------
{'took': 863, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 244, 'sum_other_doc_count': 20198728, 'buckets': [{'key': 'green39610', 'doc_count': 129}, {'key': 'yellow12715', 'doc_count': 129}, {'key': 'green49960', 'doc_count': 128}, {'key': 'black34451', 'doc_count': 127}, {'key': 'blue21828', 'doc_count': 127}, {'key': 'red1659', 'doc_count': 127}, {'key': 'red42873', 'doc_count': 127}, {'key': 'black62989', 'doc_count': 126}, {'key': 'blue51538', 'doc_count': 126}, {'key': 'green21326', 'doc_count': 126}]}}}
0:00:00.867180



With eager_global_ordinals - 
--------------------------
with map
--------
{'took': 13225, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 244, 'sum_other_doc_count': 20198724, 'buckets': [{'key': 'yellow49230', 'doc_count': 130}, {'key': 'black67354', 'doc_count': 128}, {'key': 'blue66964', 'doc_count': 128}, {'key': 'green77280', 'doc_count': 128}, {'key': 'yellow44885', 'doc_count': 128}, {'key': 'black60109', 'doc_count': 127}, {'key': 'blue12784', 'doc_count': 127}, {'key': 'green32715', 'doc_count': 127}, {'key': 'yellow71316', 'doc_count': 127}, {'key': 'black73015', 'doc_count': 126}]}}}
0:00:13.229019


with ordinals
-------------
{'took': 289, 'timed_out': False, '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'rela
tion': 'gte'}, 'max_score': None, 'hits': []}, 'aggregations': {'color_group': {'doc_count_error_upper_bound': 245, 'sum_other_doc_count': 2019
8733, 'buckets': [{'key': 'black8584', 'doc_count': 128}, {'key': 'blue71747', 'doc_count': 128}, {'key': 'red95343', 'doc_count': 128}, {'key'
: 'black99004', 'doc_count': 127}, {'key': 'black65579', 'doc_count': 126}, {'key': 'black84931', 'doc_count': 126}, {'key': 'black94373', 'doc
_count': 126}, {'key': 'blue84922', 'doc_count': 126}, {'key': 'green39849', 'doc_count': 126}, {'key': 'green54011', 'doc_count': 126}]}}}
0:00:00.293075

rishabhmaurya · 2022-08-02T22:26:10Z

Using murmur plugin to precompute hashes and using hash fields for cardinality aggregation significantly helped one user. I'll ask them to post more update on this thread.

kcharikrish · 2022-08-03T02:12:37Z

We were able to achieve performance improvement with the use of murmur plugin to precompute hash value of a field in an index.

Scenario: We are migrating from ElasticSearch v7.10 to OpenSearch v1.2. Post migration, we observed that a search query with Cardinality Aggregation was having performance degradation of 186%. Sample query is given below.

GET sample-index-20220721/_search?timeout=10m { "aggregations": { "user_q": { "aggregations": { "account_name_total": { "cardinality": { "field": "account_name" } } }, "terms": { "field": "user", "size": 10000 } } }, "query": { "query_string": { "query": "timestamp:[1657828498000||/m TO 1657914898000||/m] AND storage_id:302391 AND type:1", "allow_leading_wildcard": false } }, "size": 0, "track_total_hits": true }

Steps Performed for Triage with Observations:
We executed the long running query in asynchronous mode with “Profile API” option and observed the following points

Identified that “account_name” field is a type “keyword” and has high cardinality
The Aggregations profile API showed that the “CardinalityAggregator” type was taking more time than expected across multiple shards, within the index where the data was located
Under the debug section of the aggregation, we were able to see that no global ordinals were used (see ordinals_collections_used=0 and ordinals_collectors_overhead_too_high=1).
"debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 1, "string_hashing_collectors_used" : 1, "numeric_collectors_used" : 0, "empty_collectors_used" : 0 }
As per the recommendation from this article, if a string field has high cardinality, it is recommended to store the hash value of the field in the index. This can be done using the murmur plugin
We also observed that, the query execution takes times only on indices which has data matching the query

Performance Tuning changes:
Based on the observation, we performed the following steps

Created a new index “sample-index-20220721-updated” with the following two changes
a. Used murmur plugin to store hash value of the field in the index for “account_name” field. Follow the steps in this link
b. Enabled “eager_global_ordinals”: true for “user” field. We did this as we are using user field under “terms” within the query
Re-indexed the source index “sample-index-20220721” to “sample-index-20220721-updated” for the changes to take effect.

Once the above changes were implemented, we were able to see good decrease in the execution time of the query. Though the response time didn't fall back to the pre Lucene 8.9.0 levels with murmur plugin, but this solution gives good response times in OpenSearch v1.2

Updated Debug section for the Index in the Profile API output:

"debug" : { "ordinals_collectors_used" : 0, "ordinals_collectors_overhead_too_high" : 0, "string_hashing_collectors_used" : 0, "numeric_collectors_used" : 34, "empty_collectors_used" : 0 }

rishabhmaurya · 2022-08-08T21:20:52Z

Thanks @kcharikrish for working on it and posting the analysis. Conclusion so far is to use the right index configuration and adjust query params to avoid performance degradation. Closing this thread, feel free to open if above suggested configuration doesn't work for you.

rtiruveedulaarkin added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 8, 2022

ryanbogan added bug Something isn't working Search:Aggregations and removed enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 12, 2022

rtiruveedulaarkin changed the title ~~Performance degradation in OpenSearch 1.1.0 due to Lucene 8.8.2~~ Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 Apr 13, 2022

kotwanikunal assigned adnapibar Apr 18, 2022

adnapibar assigned rishabhmaurya and unassigned adnapibar May 4, 2022

kotwanikunal closed this as completed Aug 8, 2022

rishabhmaurya mentioned this issue Aug 15, 2024

[Feature Request] Add a cluster setting for memory threshold to pick OrdinalsCollector in Cardinality aggregation #15269

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

rtiruveedulaarkin commented Apr 8, 2022

andrross commented Apr 12, 2022

rtiruveedulaarkin commented Apr 13, 2022

andrross commented Apr 13, 2022

rtiruveedulaarkin commented Apr 13, 2022

andrross commented Apr 15, 2022

rishabhmaurya commented May 4, 2022

rtiruveedulaarkin commented May 4, 2022

rishabhmaurya commented May 4, 2022 •

edited

Loading

rtiruveedulaarkin commented May 5, 2022

rishabhmaurya commented May 5, 2022

rtiruveedulaarkin commented May 9, 2022

rishabhmaurya commented May 9, 2022

rishabhmaurya commented May 10, 2022

rishabhmaurya commented May 23, 2022 •

edited

Loading

rishabhmaurya commented Aug 2, 2022

kcharikrish commented Aug 3, 2022 •

edited

Loading

rishabhmaurya commented Aug 8, 2022

Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

Performance degradation in OpenSearch 1.1.0 due to Lucene 8.9.0 #2820

Comments

rtiruveedulaarkin commented Apr 8, 2022

andrross commented Apr 12, 2022

rtiruveedulaarkin commented Apr 13, 2022

andrross commented Apr 13, 2022

rtiruveedulaarkin commented Apr 13, 2022

andrross commented Apr 15, 2022

rishabhmaurya commented May 4, 2022

rtiruveedulaarkin commented May 4, 2022

rishabhmaurya commented May 4, 2022 • edited Loading

rtiruveedulaarkin commented May 5, 2022

rishabhmaurya commented May 5, 2022

rtiruveedulaarkin commented May 9, 2022

rishabhmaurya commented May 9, 2022

rishabhmaurya commented May 10, 2022

rishabhmaurya commented May 23, 2022 • edited Loading

rishabhmaurya commented Aug 2, 2022

kcharikrish commented Aug 3, 2022 • edited Loading

rishabhmaurya commented Aug 8, 2022

rishabhmaurya commented May 4, 2022 •

edited

Loading

rishabhmaurya commented May 23, 2022 •

edited

Loading

kcharikrish commented Aug 3, 2022 •

edited

Loading