Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip shards when querying constant keyword fields #96161

Conversation

salvatore-campagna
Copy link
Contributor

@salvatore-campagna salvatore-campagna commented May 16, 2023

When a query like a match phrase query or a wildcard query targets a constant keyword field we can skip the query execution on shards where the query builder is rewritten to a MatchNoneQueryBuilder.

This allows us to save time and resources executing queries on nodes that do not return any result. Constant keyword fields allow rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which, in turn, is rewritten into a MatchNoneQueryBuilder in case a constant keyword field does not match the value specified by the query.

A match phrase query is often executed on constant keyword fields by different Elasticsearch integrations which target multiple indices by means of index patterns or data streams. In a real world scenario it is likely that the index pattern or data stream includes tens or hundreds of backing indices each of them with multiple shards involved. As a result, skipping shards in such scenario might result in better query performance and better cluster resource usage.

Note, anyway, that currently, execution of the pre-filter and the corresponding "can match" phase depends on the overall number of shards involved and on whether there is at least one of them returning a non-empty result.

We do the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access the index mapping and extract information about constant keyword fields and their value. Doing this on the coordinator node is not possible due to the unavailability of the index mapping.

Resolves #95541

When a query like a match phrase query or a wildcard query targets
a constant keyword field we can skip the query execution on shards
where the query builder is rewritten to a MatchNoneQueryBuilder.

This allows us to save time and resources executing queries on
nodes that do not return any result. Constant keyword fields allow
rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which,
in turn, is rewritten into a MatchNoneQueryBuilder in case a
constant keyword field does not match the value specified by the
query.

A match phrase query is often executed on constnt keyword fields
by different Elasticsearch integrations which target multiple
indices by means of index patterns or data streams. In a real world
scenario it is likely that the index pattern or data stream includes
tens or hundreds of backing indices. As a result, skipping shards
in such scenario might result in better query prformance and better
cluster resource usage.

Note, anyway, that currently execution of the pre filter and the
corresponding "can match" phase depends on the voerall number of
shards involved and on whether there is at least one of them returning
a non-empty result.

We do the rewrite operation on the data node in the so called
"can match" phase, taking advantage of the fact that, at that
moment, we can access the index mapping and extract information
about constant keyword fields and their value.
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.9.0 labels May 16, 2023
@salvatore-campagna salvatore-campagna self-assigned this May 16, 2023
@salvatore-campagna salvatore-campagna added >enhancement :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics and removed needs:triage Requires assignment of a team area label labels May 16, 2023
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels May 16, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @salvatore-campagna, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the PR because I thought it would be difficult to achieve but looks like we already handle the case where the SearchExecutionContext doesn't have a searcher. Nice one!
I left one comment to simplify further.
I also wonder why the optim is disabled when runtime fields are added in the request?
What are we protecting against?

if (rewritten.query() instanceof MatchNoneQueryBuilder) {
return new CanMatchShardResponse(false, null);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MappingAwareRewriteContext doesn't seem to be needed. Why not creating a new SearchExecutionContext when the index service is available.
You should also reuse queryStillMatchesAfterRewrite(request, context) to ensure that all cases are checked (alias filter and global aggregations).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MappingAwareRewriteContext doesn't seem to be needed. Why not creating a new SearchExecutionContext when the index service is available.

If we use SearchExecutionContext then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder instance. But if there are pending refreshes we can't trust it. The idea was to have a special context that only rewrites with the mappings in mind (for constant keyword fields). If the main query gets rewritten to MatchNoneQueryBuilder then we can trust it even when there are pending refreshes.

You should also reuse queryStillMatchesAfterRewrite(request, context) to ensure that all cases are checked (alias filter and global aggregations).

Good point. This should be possible even with MappingAwareRewriteContext .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use SearchExecutionContext then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder instance.

The initial change already used SearchExecutionContext but it sets the searcher to null. That's why I said that the special context is not needed. There's some logic to handle SearchExecutionContext with a null searcher, we should add some javadocs to explain the use case (rewriting with the mapping only) but it works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see this now. Yes, if we pass a null searcher then this should work. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add an additional null check in isFieldWithinQuery. That logic uses the reader which might be null in this case and triggers a NPE at some point.

We just need a search execution context that includes the index service
and the mappings to take advantage of query rewriting based on mappings.
@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented May 17, 2023

I looked at the PR because I thought it would be difficult to achieve but looks like we already handle the case where the SearchExecutionContext doesn't have a searcher. Nice one! I left one comment to simplify further. I also wonder why the optim is disabled when runtime fields are added in the request? What are we protecting against?

The issue with runtime mappings, I think, was happening because I used Collections.emptyMap() in the new SearchExecutionContext not passing the request.getRuntimeMappings() instead. So there was a mismatch between the actual runtime mappings and the ones used in the new SearchExecutionContext.

When rewriting a query before the searcher is available the search
execution context uses a null searcher which, later on, results in
a NPE.
@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine test this please

@salvatore-campagna
Copy link
Contributor Author

I have just create issue #96280 describing the pre-requisite refactoring needed for this.

@salvatore-campagna
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-1 please.

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jun 5, 2023

@javanna @romseygeek @martijnvg I adjusted this PR after merging #96353

Now we just use a QueryRewriteContext instead of a SearchExecutionContext with null IndexSearcher.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* making the shard search active and waiting for refreshes. As a result, we only wait for refreshes to happen on shards that have
* actual data.
*/
private static boolean canMatchAfterRewrite(final ShardSearchRequest request, final IndexService indexService) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword: This allows us to avoid extra work other than making the shard search active and waiting for refreshes. to This allows us to avoid extra work for example making the shard search active and waiting for refreshes.?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments, especially one on testing, LGTM otherwise.

* Creates a new {@link QueryRewriteContext}.
* This class is used to rewrite queries before we are able to get a valid {@link SearchExecutionContext} and
* allows us to anticipate rewriting queries before the query is executed on the data node. Not using a full
* SearhcExecutionContext allows us to save on query latency and IO operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchExecutionContext is mistyped. Shall we also clarify that this is for cases where access to the index is not required as we can make decisions based on mappings alone? Also, expand on what saves query latency and IO operations? e.g. not pulling a searcher removes the cost of pulling a searcher as well as the associated cost of refreshing idle shards.

* for instance if this rewrite context is used to index queries (percolation). */
* which happens in two cases:
* 1. if this rewrite context is used to index queries (percolation)
* 2. if we use mapping information to skip shards while doing shards pre-filtering in the 'can match' phase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the second still true given the recent improvements you made? Are there places where we do mappings based can match with a null searcher?

assertEquals(refreshStatsBefore.size(), refreshStatsAfter.size());
assertTrue(refreshStatsAfter.containsAll(refreshStatsBefore));
assertTrue(refreshStatsBefore.containsAll(refreshStatsAfter));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd that we don't have unit tests, yet I see that SearchServiceTests that already exists is also an ESSingleNodeTestCase. You may want to simplify your test taking inspiration from it, in that it pulls the search service from the node and calls canMatch directly against it.

@salvatore-campagna salvatore-campagna merged commit a732ecd into elastic:main Jun 9, 2023
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Jun 18, 2023
… rewrite without a SearchExecutionContext.

With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available.

The `TermQueryBuilder` can with this resolve to a `MatchAllQueryBuilder` with needing a `SearchExecutionContext`, which during the can_match phase means that no searcher needs to be acquired and therefor avoiding making a shard search active / potentially refresh.

The `AbstractQueryBuilder#doRewrite(...)` method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite.

This was forgotten as part of elastic#96161 and is needed to complete elastic#95776.
martijnvg added a commit that referenced this pull request Jun 19, 2023
…t a SearchExecutionContext. (#96905)

With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available.

The TermQueryBuilder can with this change resolve to a MatchNoneQueryBuilder without needing a SearchExecutionContext, which during the can_match phase means that no searcher needs to be acquired and therefor avoid making a shard search active and doing a potentially refresh.

The AbstractQueryBuilder#doRewrite(...) method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This is inline with what was discussed here: #96161 (comment)

This change was forgotten as part of #96161 and is needed to complete #95776.
@elasticsearchmachine
Copy link
Collaborator

@salvatore-campagna according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

  • The PR is labelled release highlight but the changelog has no highlight section

elasticsearchmachine pushed a commit that referenced this pull request Jun 28, 2023
Update the changelog after making the original PR a `release highlight`.

See #96161
salvatore-campagna added a commit to salvatore-campagna/elasticsearch that referenced this pull request Jun 28, 2023
Update the changelog after making the original PR a `release highlight`.

See elastic#96161
elasticsearchmachine pushed a commit that referenced this pull request Jun 28, 2023
Update the changelog after making the original PR a `release highlight`.

See #96161
@costin costin mentioned this pull request Aug 31, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement release highlight :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid refreshing search-idle shards that don't yield results after query rewrite
6 participants