Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate agg order parameter at parse time #20003

Open
colings86 opened this issue Aug 16, 2016 · 12 comments
Open

Validate agg order parameter at parse time #20003

colings86 opened this issue Aug 16, 2016 · 12 comments
Labels
:Analytics/Aggregations Aggregations >bug stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@colings86
Copy link
Contributor

At the moment we don't validate that the order parameter for the histogram, date_histogram and terms aggregations is valid at parse time. Instead we rely on an exception to be thrown when we try to sort the buckets at reduce time. Apart from the not being a nice user experience as we should fail quickly on bad requests this also causes a few other issues highlighted in #14771 and #19851:

  • No exception is thrown if an invalid order is defined but the aggregation returns either 0 or 1 bucket because we never actually sort the buckets. This can lead to confusion as it is unclear why one request works (i.e. one that returns multiple buckets) but another doesn't (one the returns either 0 or 1 bucket)
  • Because the exception is thrown in the reduce phase it is a SearchPhaseExecutionException which maps to a 503 Unavailable status code when in fact it should return a 400 Bad Request status code and ideal the exception should be a SearchParseException
@qwerty4030
Copy link
Contributor

qwerty4030 commented Aug 20, 2016

I think the terms aggregation validates the order during the search phase. I get a 500 response with a bad terms agg order vs a 503 from a bad (date) histogram order. Example.
It looks like validation code does exist in AggregationPath.java. The code gets a bit hairy but I think the reason this is called during the search phrase is that all the Aggregators need to be created before validation which happens in the AggregationPhase. Would it be possible to create the Aggregators during the parsing phase and then perform validation?
If that's too painful it seems it wouldn't be too hard to call the same validation logic the terms agg uses during the search phrase for the (date) histogram.

@qwerty4030
Copy link
Contributor

Also the terms agg supports multiple criteria for the order, but the (date) histogram agg does not: "order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]. I don't see a reason why the histogram agg order would have different behavior than the terms agg. The class org.elasticsearch.search.aggregations.bucket.terms.InternalOrder.CompoundOrder contains the code to support multiple criteria for the terms agg order.

@colings86
Copy link
Contributor Author

@qwerty4030 the parsing happens on the coordinating node (the node that receives the request from the client) but Aggregators are created on the shard so we can't create the Aggregators during the parsing phase and perform the validation with them. I think we will need to create code to do the validation using the AggregatorBuilder objects instead since this is what we have available on the coordinating node at parse time.

As for the discrepancy between order in the terms aggregation and the histogram aggregations, it would be nice to have them consistent. Maybe we could have both use the same Order classes rather than having separate implementations but I fear that this will be fairly tricky.

@qwerty4030
Copy link
Contributor

I can see this happening in two parts:

  1. Refactor histogram aggs to use the same order code/logic as the terms agg.
  2. Refactor order to be validated at parse time (use AggregatorBuilder instead of Aggregator).

1 doesn't seem too bad since the code already exists. 2 is more involved: I saw alot of instanceofs, casting, and tricky assumptions in the current order validation. Ideally the order validation can be cleaned up to be more interface friendly.

@javanna
Copy link
Member

javanna commented May 5, 2017

@colings86 does this issue need further discussion?

@colings86 colings86 removed the discuss label May 6, 2017
qwerty4030 added a commit to qwerty4030/elasticsearch that referenced this issue May 11, 2017
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.

Relates to elastic#20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).

Relates to elastic#14771: histogram and date_histogram aggregation order will now be validated at reduce time.

Closes elastic#23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
colings86 pushed a commit that referenced this issue May 11, 2017
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.

Relates to #20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).

Relates to #14771: histogram and date_histogram aggregation order will now be validated at reduce time.

Closes #23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
@qwerty4030
Copy link
Contributor

Well now that step 1 was done in #22343 we can give part 2 a shot. @colings86 @javanna Do you guys see any obvious roadblocks to validate terms and histogram aggs that are ordered by sub-aggregation, e.g. multi sub-agg levels, metrics, scripts, etc... by just looking at the AggregatorBuilder instance? Thanks

@colings86
Copy link
Contributor Author

@qwerty4030 I don't see any roadblocks. I think this should be possible since the pipeline aggregations do something a little similar when validating the buckets_path, however I'm not going to go so far as saying that its going to be easy. Maybe we should start with trying to add the validation on AggregatorBuilder's without modifying the existing logic with the Aggregator's (so essentially do both) and then once we know the validation using the AggregatorBuilder's is robust we can remove any unnecessary logic later int he execution?

@qwerty4030
Copy link
Contributor

Sounds good. Thanks for the tip about the pipeline aggregations. I found some relevant code here, but I'm not sure how robust this validation is.

@polyfractal
Copy link
Contributor

@elastic/es-search-aggs

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@wchaparro
Copy link
Member

Because the exception is thrown in the reduce phase it is a SearchPhaseExecutionException which maps to a 503 Unavailable status code when in fact it should return a 400 Bad Request status code and ideal the exception should be a SearchParseException

Changing this to a bug - to address that we are returning the correct error code.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants