First step towards incremental reduction of query responses #23253

s1monw · 2017-02-19T20:41:25Z

Today all query results are buffered up until we received responses of
all shards. This can hold on to a significant amount of memory if the number of
shards is large. This commit adds a first step towards incrementally reducing
aggregations results if a, per search request, configurable amount of responses
are received. If enough query results have been received and buffered all so-far
received aggregation responses will be reduced and released to be GCed.

this PR really needs some reviews and potential discussions but it's a start and outlines what it takes to make this feature work

Today all query results are buffered up until we received responses of all shards. This can hold on to a significant amount of memory if the number of shards is large. This commit adds a first step towards incrementally reducing aggregations results if a, per search request, configurable amount of responses are received. If enough query results have been received and buffered all so-far received aggregation responses will be reduced and released to be GCed.

s1monw · 2017-02-20T06:25:45Z

@elasticmachine test this please

jpountz

I like this change, I expected it to be more complex than that so this is a good surprise to me! I left some picky coments about naming and comments to make this change a bit easier to read. I think the interesting question is about how many buckets intermediate reduces for terms (or geo-hash) aggregations should produce.

jpountz · 2017-02-20T07:42:45Z

core/src/main/java/org/elasticsearch/action/search/InitialSearchPhase.java

+         */
+        public Stream<Result> stream() {
+            return results.asList().stream().map(e -> e.value);
+        }


Not sure how to address it but when I see both a size() and stream() method on a class, I tend to expect that the stream wraps size elements. I wonder whether we should make naming a bit more explicit to avoid this potential confusion.

jpountz · 2017-02-20T08:02:21Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+            if (buffer != null) {
+                InternalAggregations aggregations = (InternalAggregations) querySearchResult.consumeAggs();
+                // once the size is incremented to the length of the buffer we know all elements are added
+                // we also have happens before guarantees due to the memory barrier of the size write


is this comment outdated?

yeah I had a complex solution first with non-blocking concurrency etc. I didn't go with it apparently

jpountz · 2017-02-20T08:07:32Z

core/src/main/java/org/elasticsearch/search/aggregations/InternalAggregation.java

+        }
+
+        /**
+         * Returns <code>true</code> iff the current reduce phase is the final reduce phase. This indicated if operations like


s/indicated/indicates/

jpountz · 2017-02-20T08:10:11Z

core/src/test/java/org/elasticsearch/search/aggregations/bucket/MinDocCountIT.java

-                }
-                logger.info("test failed. trying to see if it recovers after 1m.", ae);
-                try {
-                    Thread.sleep(60000);


jpountz · 2017-02-20T08:21:38Z

core/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/InternalTerms.java

@@ -228,7 +228,7 @@ public InternalAggregation doReduce(List<InternalAggregation> aggregations, Redu
            }
        }

-        final int size = Math.min(requiredSize, buckets.size());
+        final int size = reduceContext.isFinalReduce() == false ? buckets.size() : Math.min(requiredSize, buckets.size());


Since one of the goals of this change is to limit memory usage, I wonder whether this should use getShardSize() rather than buckets.size(), this should be a good trade-off between accuracy and memory usage? cc @colings86

In theory I think this would be a good change to make, however I think we should do it in a separate PR as it may require the error calculations to be tweaked a bit to be correct.

jpountz · 2017-02-20T08:26:30Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

        final SearchProfileShardResults shardResults = profileResults.isEmpty() ? null : new SearchProfileShardResults(profileResults);
        return new ReducedQueryPhase(totalHits, fetchHits, maxScore, timedOut, terminatedEarly, firstResult, suggest, aggregations,
            shardResults);
    }

+
+    private InternalAggregations reduceAggsOnly(List<InternalAggregations> aggregationsList) {


maybe update the name or add a comment to say that this method is about performing an intermediate reduce? (as opposed to final)

jpountz · 2017-02-20T08:30:36Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

    /**
     * Reduces the given query results and consumes all aggregations and profile results.
     * @see QuerySearchResult#consumeAggs()
     * @see QuerySearchResult#consumeProfileResult()
     */
-    public final ReducedQueryPhase reducedQueryPhase(List<? extends AtomicArray.Entry<? extends QuerySearchResultProvider>> queryResults) {
+    public final ReducedQueryPhase reducedQueryPhase(List<? extends AtomicArray.Entry<? extends QuerySearchResultProvider>> queryResults,
+                                                     List<InternalAggregations> reducedAggs) {


Can you add a comment to explain that reducedAggs is the result from intermediate reduce operations?

jpountz · 2017-02-20T08:37:40Z

core/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+            aggregationsList = new ArrayList<>(queryResults.size());
+        } else {
+            aggregationsList = reducedAggs == null ? Collections.emptyList() : reducedAggs;
+        }


I think hasAggs is a confusing name given how this method was refactored. Maybe we should restructure the logic a bit with comments to explain what each case maps to? eg. something like

if (reducedAggs != null) { // we already have results from intermediate reduces and just need to perform the final reduce assert firstResult.hasAggs(); } else if (firstResult.hasAggs()) { // the number of shards was less than the buffer size so we reduce agg results directly } else { // no aggregations }

s1monw · 2017-02-20T08:49:18Z

@elasticmachine would you bother to test this

s1monw · 2017-02-20T09:19:48Z

@jpountz I pushed some changes

jpountz · 2017-02-20T09:24:42Z

Looks great. Do you have any opinions about the size to use for intermediate reduces of terms aggs? I'm good with pulling this change in and making it a follow-up, this change is already a net improvement as-is.

colings86

LGTM, I left a couple of minor comments

colings86 · 2017-02-20T10:46:54Z

core/src/main/java/org/elasticsearch/action/search/SearchRequestBuilder.java

@@ -46,6 +46,8 @@
 */
 public class SearchRequestBuilder extends ActionRequestBuilder<SearchRequest, SearchResponse, SearchRequestBuilder> {

+    private int reduceUpTo;


Is this used? It looks below like we set this directly on the request?

colings86 · 2017-02-20T10:54:02Z

core/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/InternalTerms.java

@@ -228,7 +228,7 @@ public InternalAggregation doReduce(List<InternalAggregation> aggregations, Redu
            }
        }

-        final int size = Math.min(requiredSize, buckets.size());
+        final int size = reduceContext.isFinalReduce() == false ? buckets.size() : Math.min(requiredSize, buckets.size());


In theory I think this would be a good change to make, however I think we should do it in a separate PR as it may require the error calculations to be tweaked a bit to be correct.

s1monw · 2017-02-20T11:27:15Z

@jpountz I will open followup for the following things:

size of terms aggs in incremental reduce Explore possibility of smaller size for terms aggs in incremental reduce #23285
fix error calculation in terms aggs Fix terms agg error calculation if shared results are reduced incrementally. #23286
exposing the batch size in REST (not part of this PR) Expose batched_reduce_size via _search #23288
removing / raising our softlimit for num of shards in a search request
using the same technique for profiling results

I think we should just pull this one in without adding more stuff to it

jpountz · 2017-02-20T11:31:02Z

I think we should just pull this one in without adding more stuff to it

yes

In elastic#23253 we added an the ability to incrementally reduce search results. This change exposes the parameter to control the batch since and therefore the memory consumption of a large search request.

We can and should randomly reduce down to a single result before we passing the aggs to the final reduce. This commit changes the logic to do that and ensures we don't trip the assertions the previous imple tripped. Relates to #23253

Some randomization caused reduction of the same agg multiple times which causes issues on some aggregations. Relates to #23253

InternalTopHits uses "==" to compare hit scores and fails when score is NaN. This commit changes the comparaison to always use Double.compare. Relates #23253

In #23253 we added an the ability to incrementally reduce search results. This change exposes the parameter to control the batch since and therefore the memory consumption of a large search request.

Both PRs below have been backported to 5.4 such that we can enable BWC tests of this feature as well as remove version dependend serialization for search request / responses. Relates to elastic#23288 Relates to elastic#23253

Today all query results are buffered up until we received responses of all shards. This can hold on to a significant amount of memory if the number of shards is large. This commit adds a first step towards incrementally reducing aggregations results if a, per search request, configurable amount of responses are received. If enough query results have been received and buffered all so-far received aggregation responses will be reduced and released to be GCed.

We can and should randomly reduce down to a single result before we passing the aggs to the final reduce. This commit changes the logic to do that and ensures we don't trip the assertions the previous imple tripped. Relates to #23253

In #23253 we added an the ability to incrementally reduce search results. This change exposes the parameter to control the batch since and therefore the memory consumption of a large search request.

Both PRs below have been backported to 5.4 such that we can enable BWC tests of this feature as well as remove version dependend serialization for search request / responses. Relates to #23288 Relates to #23253

IdanWo · 2017-07-21T11:47:44Z

If I understand it right, the motivation here is to make several small top-10 calculations in the coordinating node, instead of making a single large calculation in the end when all the responses are available? Does this change effect the accuracy of the terms aggregation, as in oppose to the previous approach?

jpountz · 2017-07-21T11:58:25Z

Yes, it potentially reduces the accuracy of terms aggregations.

Note that combined with #25658, this change only starts reducing accuracy of the terms aggregations if more than 512 shards have matches. So say that you query 1000 shards but only 300 of them have matches, the accuracy will be the same as today.

IdanWo · 2017-07-21T12:07:53Z

Okay, got it. Sounds cool. It means that this behavior is activated in only specific conditions? Or that this is the new behavior, and it will reduce accuracy in only some conditions?

By the way, will the user see this inaccuracy in the error bounds? (sum_other_doc_count, doc_count_error_upper_bound)? And if not, will this configuration be documented? I believe there should be a page for "tune for search accuracy", while there is a page for "tune for search speed" or "tune for disk usage".

I understand that there's nothing to do when making a request to so many shards at once, but I don't like the approach of "scaling out, performance and latency are always much more important factors than accuracy". No one mentioned that less accurate results will come here, in Elasticsearch 5.4.0 released blog post:

That said, it is quite easy to reach the 1,000 shard limit, especially with the recent release of Cross Cluster Search. As of 5.4.0, Top-N search results and aggregations are reduced in batches of 512, which puts an upper limit on the amount of memory used on the coordinating node, which has allowed us to set the shard soft limit to unlimited by default.

jpountz · 2017-07-21T12:17:27Z

Will the user see this inaccuracy in the error bounds?

Yes.

I understand that there's nothing to do when making a request to so many shards at once, but I don't like the approach of "scaling out, performance and latency are much more important factors than accuracy".

I would agree it would be discussable if that was only about performance and latency, but to me this is mostly about cluster stability, which I consider more important than the accuracy of terms aggs.

In my opinion the number of users that are affected by decreased accuracy of terms aggs is low enough that mentioning it in the release notes would be more confusing than helping.

colings86 · 2017-07-21T13:13:14Z

@jpountz @IdanWo Actually I am pretty sure that we don't lose any accuracy on the terms aggregation because of incremental reduce because we do not truncate the list of terms until we are doing the 'final' incremental reduction (see https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/InternalTerms.java#L284) so the accuracy should not be affected. We do have #23285 open to explore truncating the list during the other incremental reductions though and this would indeed affect the accuracy of the terms aggregation if implemented.

jpountz · 2017-07-21T13:33:29Z

@colings86 Oh right, I remember we discussed it but had forgotten which approach we took. Thanks for clarifying!

s1monw added :Analytics/Aggregations Aggregations :Search/Search Search-related issues that do not fall into other categories >enhancement review v5.4.0 v6.0.0-alpha1 labels Feb 19, 2017

s1monw assigned jpountz Feb 19, 2017

s1monw requested review from colings86 and jpountz February 19, 2017 20:41

clintongormley mentioned this pull request Feb 20, 2017

No need to use Field Stats API in the dashboard and visualise tabs (and possibly Discover too) elastic/kibana#6644

Closed

jpountz approved these changes Feb 20, 2017

View reviewed changes

s1monw added 3 commits February 20, 2017 10:02

only enter sync block if it's absolutely necessary

2e58822

refactor getter names to be less confusing / more explicit

ac5c7e0

add comments and clarify method names/ parameters

938d765

initialze list properly

89084d4

colings86 approved these changes Feb 20, 2017

View reviewed changes

s1monw added 2 commits February 20, 2017 12:05

remove unused variable

ab00ff7

remove redundant modifier

a9e51d2

s1monw added 5 commits February 20, 2017 13:32

fix aggregation reduce assertion

609677a

only use buffering result consumer if there are aggs requested

3acb556

Merge branch 'master' into incremental_reduce

502fe16

remove obsolete TODO

202ba3b

add explain to an assert

d1919dc

s1monw added 3 commits February 21, 2017 09:48

never drop a bucket if it's partial reduce

683c26f

fix pagination in top hits when multiple reduces are applied

41b3f3e

Merge branch 'master' into incremental_reduce

81fb0b1

s1monw merged commit f933f80 into elastic:master Feb 21, 2017

s1monw deleted the incremental_reduce branch February 21, 2017 12:02

clintongormley added the release highlight label Feb 21, 2017

This was referenced Feb 21, 2017

Explore possibility of smaller size for terms aggs in incremental reduce #23285

Closed

Fix terms agg error calculation if shared results are reduced incrementally. #23286

Closed

s1monw mentioned this pull request Feb 21, 2017

Expose batched_reduce_size via _search #23288

Merged

s1monw added a commit that referenced this pull request Feb 21, 2017

Never reduce the same agg twice

5e4ba4a

Some randomization caused reduction of the same agg multiple times which causes issues on some aggregations. Relates to #23253

jimczi added a commit that referenced this pull request Feb 21, 2017

Fix comparaison of double in InternalTopHits

1ba9770

InternalTopHits uses "==" to compare hit scores and fails when score is NaN. This commit changes the comparaison to always use Double.compare. Relates #23253

s1monw mentioned this pull request Feb 22, 2017

Remove BWC layer for number of reduce phases #23303

Merged

ruflin mentioned this pull request Feb 28, 2017

Lower the default number of primary shards elastic/beats#3431

Closed

epixa mentioned this pull request Apr 3, 2017

Cross-cluster search support elastic/kibana#11011

Closed

3 tasks

IdanWo mentioned this pull request Jul 21, 2017

Remove support for sorting terms aggregation by ascending count #17614

Closed

jpountz mentioned this pull request Aug 7, 2017

fatal error on the network layer on es + kibana dashboard system #26075

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First step towards incremental reduction of query responses #23253

First step towards incremental reduction of query responses #23253

s1monw commented Feb 19, 2017

s1monw commented Feb 20, 2017

jpountz left a comment

jpountz Feb 20, 2017

jpountz Feb 20, 2017

s1monw Feb 20, 2017

jpountz Feb 20, 2017

jpountz Feb 20, 2017

s1monw Feb 20, 2017

jpountz Feb 20, 2017

colings86 Feb 20, 2017

jpountz Feb 20, 2017

jpountz Feb 20, 2017

jpountz Feb 20, 2017

s1monw commented Feb 20, 2017

s1monw commented Feb 20, 2017

jpountz commented Feb 20, 2017

colings86 left a comment

colings86 Feb 20, 2017

colings86 Feb 20, 2017

s1monw commented Feb 20, 2017 •

edited

Loading

jpountz commented Feb 20, 2017

IdanWo commented Jul 21, 2017 •

edited

Loading

jpountz commented Jul 21, 2017

IdanWo commented Jul 21, 2017 •

edited

Loading

jpountz commented Jul 21, 2017

colings86 commented Jul 21, 2017

jpountz commented Jul 21, 2017

First step towards incremental reduction of query responses #23253

First step towards incremental reduction of query responses #23253

Conversation

s1monw commented Feb 19, 2017

s1monw commented Feb 20, 2017

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Feb 20, 2017

s1monw commented Feb 20, 2017

jpountz commented Feb 20, 2017

colings86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Feb 20, 2017 • edited Loading

jpountz commented Feb 20, 2017

IdanWo commented Jul 21, 2017 • edited Loading

jpountz commented Jul 21, 2017

IdanWo commented Jul 21, 2017 • edited Loading

jpountz commented Jul 21, 2017

colings86 commented Jul 21, 2017

jpountz commented Jul 21, 2017

s1monw commented Feb 20, 2017 •

edited

Loading

IdanWo commented Jul 21, 2017 •

edited

Loading

IdanWo commented Jul 21, 2017 •

edited

Loading