[Zen2] Add lag detector #35685

DaveCTurner · 2018-11-19T06:55:35Z

A publication can succeed and complete before all nodes have applied the
published state and acknowledged it, thanks to the publication timeout; however
we need every node eventually either to apply the published state (or a later
state) or be removed from the cluster. This change introduces the LagDetector
which achieves this liveness property by removing any lagging nodes from the
cluster.

A publication can succeed and complete before all nodes have applied the published state and acknowledged it, thanks to the publication timeout; however we need every node eventually either to apply the published state (or a later state) or be removed from the cluster. This change introduces the LagDetector which achieves this liveness property by removing any lagging nodes from the cluster.

elasticmachine · 2018-11-19T06:55:36Z

Pinging @elastic/es-distributed

ywelsch

This looks very good. I've left one question which might have impact on the tests, so I'm leaving the review of those to later.

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java

DaveCTurner · 2018-11-21T16:51:46Z

Ok, I've addressed all those points, thanks.

ywelsch

Some minor comments and perhaps a simplification to avoid adding a hook into the Publication class

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java

ywelsch · 2018-11-21T18:18:25Z

server/src/main/java/org/elasticsearch/cluster/coordination/Publication.java

@@ -251,6 +255,7 @@ void sendApplyCommit() {
        void setAppliedCommit() {
            assert state == PublicationTargetState.SENT_APPLY_COMMIT : state + " -> " + PublicationTargetState.APPLIED_COMMIT;
            state = PublicationTargetState.APPLIED_COMMIT;
+            onNodeApplicationAck.accept(discoveryNode, publishRequest.getAcceptedState().version());


I wonder if we can avoid to hook this in here, and whether there's a way to do this in Coordinator. We call ackOnce(null) here, which in turn calls AckListener.onNodeAck(DiscoveryNode, @Nullable Exception). We already hook into that acklistener in Coordinator, so we could also get those events there. And we also know the version of the cluster state we're publishing in Coordinator.

And we also know the version of the cluster state we're publishing in Coordinator.

How do we know that? We start the lag detector after clearing currentPublication, and another publication could then start. It wouldn't be right to detect lag based on a newer publication.

My suggestion is something like the following (I had to revert the not-null assertion, because I think we don't have that guarantee, and CoordinatorTests were failing):

diff --git a/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java b/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java index f132998d6ba..64e9310402c 100644 --- a/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java +++ b/server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java @@ -985,6 +985,9 @@ public class Coordinator extends AbstractLifecycleComponent implements Discovery @Override public void onNodeAck(DiscoveryNode node, Exception e) { + if (e == null) { + lagDetector.setAppliedVersion(node, publishRequest.getAcceptedState().version()); + } // acking and cluster state application for local node is handled specially if (node.equals(getLocalNode())) { synchronized (mutex) { @@ -999,7 +1002,7 @@ public class Coordinator extends AbstractLifecycleComponent implements Discovery } } }, - transportService.getThreadPool()::relativeTimeInMillis, lagDetector::setAppliedVersion); + transportService.getThreadPool()::relativeTimeInMillis); this.publishRequest = publishRequest; this.publicationContext = publicationContext; this.localNodeAckEvent = localNodeAckEvent; diff --git a/server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java b/server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java index 3180913a012..ea52ec95673 100644 --- a/server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java +++ b/server/src/main/java/org/elasticsearch/cluster/coordination/LagDetector.java @@ -85,8 +85,11 @@ public class LagDetector { } final NodeAppliedStateTracker nodeAppliedStateTracker = appliedStateTrackersByNode.get(discoveryNode); - assert nodeAppliedStateTracker != null : "untracked node " + discoveryNode + " applied version " + appliedVersion; - nodeAppliedStateTracker.increaseAppliedVersion(appliedVersion); + if (nodeAppliedStateTracker == null) { + logger.trace("node {} applied version {} but this node's version is not being tracked", discoveryNode, appliedVersion); + } else { + nodeAppliedStateTracker.increaseAppliedVersion(appliedVersion); + } } public void startLagDetector(final long version) { diff --git a/server/src/main/java/org/elasticsearch/cluster/coordination/Publication.java b/server/src/main/java/org/elasticsearch/cluster/coordination/Publication.java index a602750bba8..9ec8d562b81 100644 --- a/server/src/main/java/org/elasticsearch/cluster/coordination/Publication.java +++ b/server/src/main/java/org/elasticsearch/cluster/coordination/Publication.java @@ -36,7 +36,6 @@ import java.util.List; import java.util.Optional; import java.util.Set; import java.util.function.LongSupplier; -import java.util.function.ObjLongConsumer; public abstract class Publication { @@ -47,19 +46,16 @@ public abstract class Publication { private final AckListener ackListener; private final LongSupplier currentTimeSupplier; private final long startTime; - private final ObjLongConsumer<DiscoveryNode> onNodeApplicationAck; private Optional<ApplyCommitRequest> applyCommitRequest; // set when state is committed private boolean isCompleted; // set when publication is completed private boolean timedOut; // set when publication timed out - public Publication(PublishRequest publishRequest, AckListener ackListener, LongSupplier currentTimeSupplier, - ObjLongConsumer<DiscoveryNode> onNodeApplicationAck) { + public Publication(PublishRequest publishRequest, AckListener ackListener, LongSupplier currentTimeSupplier) { this.publishRequest = publishRequest; this.ackListener = ackListener; this.currentTimeSupplier = currentTimeSupplier; startTime = currentTimeSupplier.getAsLong(); - this.onNodeApplicationAck = onNodeApplicationAck; applyCommitRequest = Optional.empty(); publicationTargets = new ArrayList<>(publishRequest.getAcceptedState().getNodes().getNodes().size()); publishRequest.getAcceptedState().getNodes().iterator().forEachRemaining(n -> publicationTargets.add(new PublicationTarget(n))); @@ -255,7 +251,6 @@ public abstract class Publication { void setAppliedCommit() { assert state == PublicationTargetState.SENT_APPLY_COMMIT : state + " -> " + PublicationTargetState.APPLIED_COMMIT; state = PublicationTargetState.APPLIED_COMMIT; - onNodeApplicationAck.accept(discoveryNode, publishRequest.getAcceptedState().version()); ackOnce(null); } diff --git a/server/src/test/java/org/elasticsearch/cluster/coordination/PublicationTests.java b/server/src/test/java/org/elasticsearch/cluster/coordination/PublicationTests.java index e17c77ce6dc..914ee1e95f7 100644 --- a/server/src/test/java/org/elasticsearch/cluster/coordination/PublicationTests.java +++ b/server/src/test/java/org/elasticsearch/cluster/coordination/PublicationTests.java @@ -103,8 +103,7 @@ public class PublicationTests extends ESTestCase { Set<DiscoveryNode> missingJoins = new HashSet<>(); MockPublication(PublishRequest publishRequest, Discovery.AckListener ackListener, LongSupplier currentTimeSupplier) { - super(publishRequest, ackListener, currentTimeSupplier, (n, l) -> { - }); + super(publishRequest, ackListener, currentTimeSupplier); this.publishRequest = publishRequest; }

I see, ok, I did that in 63b21ee.

server/src/test/java/org/elasticsearch/cluster/coordination/CoordinatorTests.java

ywelsch · 2018-11-21T18:24:36Z

server/src/test/java/org/elasticsearch/cluster/coordination/LagDetectorTests.java

+        deterministicTaskQueue = new DeterministicTaskQueue(Settings.builder().put(NODE_NAME_SETTING.getKey(), "node").build(), random());
+
+        failedNodes = new HashSet<>();
+        lagDetector = new LagDetector(Settings.EMPTY, deterministicTaskQueue.getThreadPool(), failedNodes::add, () -> localNode);


randomize the lag timeout here?

Sure, 3102f8c

ywelsch · 2018-11-23T13:36:21Z

I've left two more comments.

This reverts commit 7150336. This reverts commit cdcc190.

ywelsch

LGTM

DaveCTurner added >enhancement v7.0.0 :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 19, 2018

DaveCTurner requested a review from ywelsch November 19, 2018 06:55

ywelsch reviewed Nov 19, 2018

View reviewed changes

ywelsch mentioned this pull request Nov 20, 2018

A new cluster coordination layer #32006

Closed

61 tasks

DaveCTurner added 6 commits November 21, 2018 16:08

Merge branch 'zen2' into 2018-11-19-lag-detector

fa3f3fc

Less block

e6f3291

Ignore local node

b39750b

Rename setting

4fbd9a8

Only schedule the one task for lag detection

4cc00ac

Merge branch 'zen2' into 2018-11-19-lag-detector

dea2640

DaveCTurner requested a review from ywelsch November 21, 2018 16:51

Fix log format

213436f

ywelsch suggested changes Nov 21, 2018

View reviewed changes

DaveCTurner added 7 commits November 23, 2018 08:57

Merge branch 'zen2' into 2018-11-19-lag-detector

286eddd

No new hashset

409bf3f

Better comment

97578a5

Better logging

31218bc

Randomise timeout

3102f8c

Assert node is tracked

cdcc190

Test is now irrelevant

7150336

DaveCTurner added 2 commits November 25, 2018 09:31

Remove bogus assertion

f956db5

This reverts commit 7150336. This reverts commit cdcc190.

Notify lag detector from CoordinatorPublication

63b21ee

DaveCTurner requested a review from ywelsch November 26, 2018 09:42

ywelsch approved these changes Nov 26, 2018

View reviewed changes

DaveCTurner merged commit a68a464 into elastic:zen2 Nov 26, 2018

DaveCTurner deleted the 2018-11-19-lag-detector branch November 26, 2018 10:52

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Zen2] Add lag detector #35685

[Zen2] Add lag detector #35685

DaveCTurner commented Nov 19, 2018

elasticmachine commented Nov 19, 2018

ywelsch left a comment

DaveCTurner commented Nov 21, 2018

ywelsch left a comment

ywelsch Nov 21, 2018

DaveCTurner Nov 23, 2018

ywelsch Nov 23, 2018

DaveCTurner Nov 26, 2018

ywelsch Nov 21, 2018

DaveCTurner Nov 23, 2018

ywelsch commented Nov 23, 2018

ywelsch left a comment

[Zen2] Add lag detector #35685

[Zen2] Add lag detector #35685

Conversation

DaveCTurner commented Nov 19, 2018

elasticmachine commented Nov 19, 2018

ywelsch left a comment

Choose a reason for hiding this comment

DaveCTurner commented Nov 21, 2018

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Nov 21, 2018

Choose a reason for hiding this comment

DaveCTurner Nov 23, 2018

Choose a reason for hiding this comment

ywelsch Nov 23, 2018

Choose a reason for hiding this comment

DaveCTurner Nov 26, 2018

Choose a reason for hiding this comment

ywelsch Nov 21, 2018

Choose a reason for hiding this comment

DaveCTurner Nov 23, 2018

Choose a reason for hiding this comment

ywelsch commented Nov 23, 2018

ywelsch left a comment

Choose a reason for hiding this comment