Cross-cluster search: preserve cluster alias in shard failures #32608

javanna · 2018-08-03T12:42:20Z

When some remote clusters return shard failures as part of a cross-cluster search request, the cluster alias currently gets lost. As a result, if the shard failures are all caused by the same error, and against indices belonging to different clusters, but with the same index name, only one failure gets returned as part of the search response, meaning that failures are grouped by index name, ignoring the cluster alias.

With this commit we make sure that ShardSearchFailure returns the cluster alias as part of the index name. Also, we set the fully qualfied index name when creating a QueryShardException. That way shard failures are grouped by cluster:index. Such fixes should cover at least most of the cases where either 1) the shard target is set but we don't have the index in the cause (we were previously reading it only from the cause that did not have the cluster alias) 2) the shard target is missing but if the cause is a QueryShardException the cluster alias does not get lost.

We also prevent NPE in case the failure cause is not set and test such scenario.

When some remote clusters return shard failures as part of a cross-cluster search request, the cluster alias currently gets lost. As a result, if the shard failures are all caused by the same error, and against indices belonging to different clusters, but with the same index name, only one failure gets returned as part of the search response, meaning that failures are grouped by index name, ignoring the cluster alias. With this commit we make sure that `ShardSearchFailure` returns the cluster alias as part of the index name. Also, we set the fully qualfied index name when creating a `QueryShardException`. That way shard failures are grouped by cluster:index. Such fixes should cover at least most of the cases where either 1) the shard target is set but we don't have the index in the cause (we were previously reading it only from the cause that does not have the cluster alas) 2) the shard target is missing but if the cause is a `QueryShardException` the clusterAlias does not get lost. We also prevent NPE in case the failure cause is not set and test such scenario.

elasticmachine · 2018-08-03T12:42:22Z

Pinging @elastic/es-search-aggs

jimczi

LGTM

javanna · 2018-08-06T09:47:07Z

thanks @jimczi !

When some remote clusters return shard failures as part of a cross-cluster search request, the cluster alias currently gets lost. As a result, if the shard failures are all caused by the same error, and against indices belonging to different clusters, but with the same index name, only one failure gets returned as part of the search response, meaning that failures are grouped by index name, ignoring the cluster alias. With this commit we make sure that `ShardSearchFailure` returns the cluster alias as part of the index name. Also, we set the fully qualfied index name when creating a `QueryShardException`. That way shard failures are grouped by cluster:index. Such fixes should cover at least most of the cases where either 1) the shard target is set but we don't have the index in the cause (we were previously reading it only from the cause that did not have the cluster alias) 2) the shard target is missing but if the cause is a `QueryShardException` the cluster alias does not get lost. We also prevent NPE in case the failure cause is not set and test such scenario.

…pe-detection-with-leading-whitespace * elastic/master: (34 commits) Cross-cluster search: preserve cluster alias in shard failures (elastic#32608) Handle AlreadyClosedException when bumping primary term [TEST] Allow to run in FIPS JVM (elastic#32607) [Test] Add ckb to the list of unsupported languages (elastic#32611) SCRIPTING: Move Aggregation Scripts to their own context (elastic#32068) Painless: Use LocalMethod Map For Lookup at Runtime (elastic#32599) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (elastic#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (elastic#32613) [Rollup] Update wire version check after backport Suppress Wildfly test in FIPS JVMs (elastic#32543) [Rollup] Improve ID scheme for rollup documents (elastic#32558) ingest: doc: move Dot Expander Processor doc to correct position (elastic#31743) [ML] Add some ML config classes to protocol library (elastic#32502) [TEST]Split transport verification mode none tests (elastic#32488) Core: Move helper date formatters over to java time (elastic#32504) [Rollup] Remove builders from DateHistogramGroupConfig (elastic#32555) [TEST} unmutes SearchAsyncActionTests and adds debugging info [ML] Add Detector config classes to protocol library (elastic#32495) [Rollup] Remove builders from MetricConfig (elastic#32536) ...

javanna · 2018-08-06T14:27:25Z

pending backport to 6.4

* 6.x: [Kerberos] Use canonical host name (#32588) Cross-cluster search: preserve cluster alias in shard failures (#32608) [TEST] Allow to run in FIPS JVM (#32607) Handle AlreadyClosedException when bumping primary term [Test] Add ckb to the list of unsupported languages (#32611) SCRIPTING: Move Aggregation Scripts to their own context (#32068) (#32629) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613) [Rollup] Improve ID scheme for rollup documents (#32558) Mutes failing SQL string function tests due to #32589 Suppress Wildfly test in FIPS JVMs (#32543) Add cluster UUID to Cluster Stats API response (#32206) [ML] Add some ML config classes to protocol library (#32502) [TEST]Split transport verification mode none tests (#32488) [Rollup] Remove builders from DateHistogramGroupConfig (#32555) [ML] Add Detector config classes to protocol library (#32495) [Rollup] Remove builders from MetricConfig (#32536) Fix race between replica reset and primary promotion (#32442) HLRC: Move commercial clients from XPackClient (#32596) Security: move User to protocol project (#32367) Minor fix for javadoc (applicable for java 11). (#32573) Painless: Move Some Lookup Logic to PainlessLookup (#32565) Core: Minor size reduction for AbstractComponent (#32509) INGEST: Enable default pipelines (#32286) (#32591) TEST: Avoid merges in testSeqNoAndCheckpoints [Rollup] Remove builders from HistoGroupConfig (#32533) fixed elements in array of produced terms (#32519) Mutes ReindexFailureTests.searchFailure dues to #28053 Mutes LicensingDocumentationIT due to #32580 Remove the SATA controller from OpenSUSE box [ML] Rename JobProvider to JobResultsProvider (#32551)

* master: Cross-cluster search: preserve cluster alias in shard failures (#32608) Handle AlreadyClosedException when bumping primary term [TEST] Allow to run in FIPS JVM (#32607) [Test] Add ckb to the list of unsupported languages (#32611) SCRIPTING: Move Aggregation Scripts to their own context (#32068) Painless: Use LocalMethod Map For Lookup at Runtime (#32599) [TEST] Enhance failure message when bulk updates have failures [ML] Add ML result classes to protocol library (#32587) Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613) [Rollup] Update wire version check after backport Suppress Wildfly test in FIPS JVMs (#32543) [Rollup] Improve ID scheme for rollup documents (#32558) ingest: doc: move Dot Expander Processor doc to correct position (#31743) [ML] Add some ML config classes to protocol library (#32502) [TEST]Split transport verification mode none tests (#32488) Core: Move helper date formatters over to java time (#32504) [Rollup] Remove builders from DateHistogramGroupConfig (#32555) [TEST} unmutes SearchAsyncActionTests and adds debugging info [ML] Add Detector config classes to protocol library (#32495) [Rollup] Remove builders from MetricConfig (#32536) Tests: Add rolling upgrade tests for watcher (#32428) Fix race between replica reset and primary promotion (#32442)

As part of elastic#32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.

As part of #32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.

When some remote clusters return shard failures as part of a cross-cluster search request, the cluster alias currently gets lost. As a result, if the shard failures are all caused by the same error, and against indices belonging to different clusters, but with the same index name, only one failure gets returned as part of the search response, meaning that failures are grouped by index name, ignoring the cluster alias. With this commit we make sure that `ShardSearchFailure` returns the cluster alias as part of the index name. Also, we set the fully qualfied index name when creating a `QueryShardException`. That way shard failures are grouped by cluster:index. Such fixes should cover at least most of the cases where either 1) the shard target is set but we don't have the index in the cause (we were previously reading it only from the cause that did not have the cluster alias) 2) the shard target is missing but if the cause is a `QueryShardException` the cluster alias does not get lost. We also prevent NPE in case the failure cause is not set and test such scenario.

As part of #32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.

javanna added >bug review :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.4.1 labels Aug 3, 2018

javanna added 2 commits August 3, 2018 14:53

fix comment

d880374

added test for null index, added tests to ExceptionsHelperTests

4cac3c8

jimczi approved these changes Aug 6, 2018

View reviewed changes

javanna merged commit 826399f into elastic:master Aug 6, 2018

javanna added the backport pending label Aug 6, 2018

javanna removed the backport pending label Aug 6, 2018

javanna added v6.5.0 backport pending v6.4.1 and removed v6.4.1 labels Aug 6, 2018

javanna mentioned this pull request Aug 6, 2018

Prevent cause from being null in ShardOperationFailedException #32640

Merged

javanna mentioned this pull request Aug 7, 2018

Preserve index_uuid when creating QueryShardException #32677

Merged

javanna self-assigned this Aug 13, 2018

javanna removed the review label Aug 24, 2018

javanna removed the backport pending label Aug 24, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-cluster search: preserve cluster alias in shard failures #32608

Cross-cluster search: preserve cluster alias in shard failures #32608

javanna commented Aug 3, 2018 •

edited

Loading

elasticmachine commented Aug 3, 2018

jimczi left a comment

javanna commented Aug 6, 2018

javanna commented Aug 6, 2018

Cross-cluster search: preserve cluster alias in shard failures #32608

Cross-cluster search: preserve cluster alias in shard failures #32608

Conversation

javanna commented Aug 3, 2018 • edited Loading

elasticmachine commented Aug 3, 2018

jimczi left a comment

Choose a reason for hiding this comment

javanna commented Aug 6, 2018

javanna commented Aug 6, 2018

javanna commented Aug 3, 2018 •

edited

Loading