Fix cluster health when closing #61709

henningandersen · 2020-08-31T11:13:22Z

When master shuts down it's cluster service, a waiting health request
would fail rather than fail over to a new master.

Fails around once every month in CI:
https://build-stats.elastic.co/goto/49d044995e1b636bfc3cb7e5e2371f3a
For instance:
https://gradle-enterprise.elastic.co/s/r2rocz2i6tah2

When master shuts down it's cluster service, a waiting health request would fail rather than fail over to a new master.

elasticmachine · 2020-08-31T11:13:23Z

Pinging @elastic/es-distributed (:Distributed/Distributed)

henningandersen · 2020-09-07T11:19:39Z

@elasticmachine run elasticsearch-ci/1

…when_master_closing

henningandersen · 2020-09-09T09:04:36Z

Failure should be fixed by #62061
@elasticmachine update branch

…when_master_closing

ywelsch

Change looks good. One question on the test

ywelsch · 2020-09-09T12:50:48Z

server/src/internalClusterTest/java/org/elasticsearch/cluster/ClusterHealthIT.java

+        boolean withIndex = randomBoolean();
+        if (withIndex) {
+            // create index with many shards to provoke the health request to wait (for green) while master is being shut down.
+            createIndex("test", Settings.builder().put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, randomIntBetween(0, 10)).build());


I'm confused. How can the cluster ever be green with that many replicas? Should this be number_of_shards?

It ensures that the cluster is yellow when the health request is made, making the health request wait on the observer, triggering the call to onClusterServiceClose when master is shutdown.

The number of replicas is cleared to 0 after having fired all the async restarts and done the master restarts. That ensures that all the requests responds with green status.

ah ok, can you add a comment to that effect

ywelsch

LGTM

…when_master_closing

henningandersen · 2020-09-18T18:42:30Z

@elasticmachine run elasticsearch-ci/packaging-sample-windows

When master shuts down it's cluster service, a waiting health request would fail rather than fail over to a new master.

Fix cluster health when closing

809c2d6

When master shuts down it's cluster service, a waiting health request would fail rather than fail over to a new master.

henningandersen added >bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.10.0 labels Aug 31, 2020

elasticmachine added the Team:Distributed Meta label for distributed team label Aug 31, 2020

henningandersen added 2 commits September 8, 2020 13:33

Merge remote-tracking branch 'origin/master' into fix_cluster_health_…

5687146

…when_master_closing

Better way to provoke wait in test case

cf46543

Merge remote-tracking branch 'origin/master' into fix_cluster_health_…

1d9c966

…when_master_closing

henningandersen requested a review from ywelsch September 9, 2020 10:45

ywelsch suggested changes Sep 9, 2020

View reviewed changes

ywelsch approved these changes Sep 9, 2020

View reviewed changes

henningandersen added 2 commits September 18, 2020 16:18

Merge remote-tracking branch 'origin/master' into fix_cluster_health_…

b89d999

…when_master_closing

Add comment

89cc97f

henningandersen merged commit db1a137 into elastic:master Sep 19, 2020

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Sep 19, 2020

Fix cluster health when closing (elastic#61709)

9a77f41

When master shuts down it's cluster service, a waiting health request would fail rather than fail over to a new master.

henningandersen mentioned this pull request Oct 8, 2020

Fix test timeout for health on master failover #63455

Merged

Mpdreamz mentioned this pull request Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this pull request Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cluster health when closing #61709

Fix cluster health when closing #61709

henningandersen commented Aug 31, 2020

elasticmachine commented Aug 31, 2020

henningandersen commented Sep 7, 2020

henningandersen commented Sep 9, 2020

ywelsch left a comment

ywelsch Sep 9, 2020

henningandersen Sep 9, 2020

ywelsch Sep 9, 2020

ywelsch left a comment

henningandersen commented Sep 18, 2020

Fix cluster health when closing #61709

Fix cluster health when closing #61709

Conversation

henningandersen commented Aug 31, 2020

elasticmachine commented Aug 31, 2020

henningandersen commented Sep 7, 2020

henningandersen commented Sep 9, 2020

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Sep 9, 2020

Choose a reason for hiding this comment

henningandersen Sep 9, 2020

Choose a reason for hiding this comment

ywelsch Sep 9, 2020

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

henningandersen commented Sep 18, 2020