Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search optimisation - add canMatch early aborts for queries on "_index" field #48681

Merged
merged 4 commits into from
Nov 15, 2019

Conversation

markharwood
Copy link
Contributor

Make queries on the “_index” field fast-fail if the target shard is an index name that doesn’t match the query expression. Part of the “canMatch” phase optimisations.

Closes #48473

@markharwood markharwood added the :Search/Search Search-related issues that do not fall into other categories label Oct 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks good to me. The original issue also mentions terms query and it should also be possible to add the logic to the prefix query ?

@markharwood markharwood force-pushed the fix/48473 branch 5 times, most recently from d7487dc to b8338e2 Compare November 1, 2019 09:17
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good to me in general, can you add tests with aliases too?

// Special-case optimisation for canMatch phase:
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field.
QueryShardContext shardContext = queryRewriteContext.convertToShardContext();
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to use the indexNameMatcher so that it works with aliases too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder about supporting aliases but wouldn't that give false negatives? I assume the indexed term in the _index field is only the physical index name so it would be a mistake to fail the canMatch phase if a query term happens to match an alias? My assumption was this issue is about optimising existing index name matching behaviour, not changing it to add alias support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queries on _index already support matching aliases since #46640, so this wouldn't change the behavior. See #23306 for the motivation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be:

Suggested change
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) {
if (shardContext != null && shardContext.indexMatches(value + "*") == false) {

?

@markharwood
Copy link
Contributor Author

I've added tests but a couple of anomalies exist:

  1. Prefix and wildcard queries on "_index" field work with physical index names but not alias names (term and terms queries work with both physical and alias names). A known limitation?
  2. "Skipped" counting logic in results is a bit of a mystery - always looks to be one less than what I would expect to see. The 70_skip_shards yaml test index needed 2 shards before it would count just one as skipped. What is the expected behaviour?

@markharwood
Copy link
Contributor Author

test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left more comments.

Prefix and wildcard queries on "_index" field work with physical index names but not alias names (term and terms queries work with both physical and alias names). A known limitation?

It should work the same. I added a comment on the prefix query rewrite, maybe that's the reason you see it not working ?

"Skipped" counting logic in results is a bit of a mystery - always looks to be one less than what I would expect to see. The 70_skip_shards yaml test index needed 2 shards before it would count just one as skipped. What is the expected behaviour?

When all shards are skipped we use one to produce an empty result. The logic is explained here:

// this is a special case where we have no hit but we need to get at least one search response in order

I guess we could avoid doing this if there are no aggregations but that's out of scope for this pr.

// Special-case optimisation for canMatch phase:
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field.
QueryShardContext shardContext = queryRewriteContext.convertToShardContext();
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be:

Suggested change
if (shardContext != null && shardContext.index().getName().startsWith(value) == false) {
if (shardContext != null && shardContext.indexMatches(value + "*") == false) {

?

// Special-case optimisation for canMatch phase:
// We can skip querying this shard if the index name doesn't match the value of this query on the "_index" field.
QueryShardContext shardContext = queryRewriteContext.convertToShardContext();
if (shardContext != null && shardContext.indexMatches(BytesRefs.toString(value)) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BytesRefs.toString is not needed ?

Suggested change
if (shardContext != null && shardContext.indexMatches(BytesRefs.toString(value)) == false) {
if (shardContext != null && shardContext.indexMatches(value) == false) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike wildcardquery value is an object not a string. I remember having a test failure where I assumed String and it was a BytesRef so had to add this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix change suggestion looks to have fixed things, thanks

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor comments but it looks good to me otherwise.

- do:
search:
ccs_minimize_roundtrips: false
rest_total_hits_as_int: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is supposed to be for bw compat of the response format only, can you use track_total_hits=true instead?

return index.getName().startsWith(pattern.substring(0, pattern.length()-1));
}
return pattern.equals(index.getName());
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to reuse Regex.simpleMatch?

@markharwood markharwood force-pushed the fix/48473 branch 2 times, most recently from 961344f to e21a561 Compare November 14, 2019 14:50
@markharwood
Copy link
Contributor Author

test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@markharwood
Copy link
Contributor Author

test this please

@markharwood markharwood merged commit 82c00f0 into elastic:master Nov 15, 2019
markharwood added a commit to markharwood/elasticsearch that referenced this pull request Nov 15, 2019
…x" field (elastic#48681)

Make queries on the “_index” field fast-fail if the target shard is an index that doesn’t match the query expression. Part of the “canMatch” phase optimisations.

Closes elastic#48473
@jimczi jimczi added the v7.6.0 label Nov 15, 2019
mdaudali added a commit to mdaudali/elasticsearch that referenced this pull request Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TermQueryBuilder and WildcardQueryBuilder should rewrite queries on _index so that shards get skipped
5 participants