Seq Number based recovery should validate last lucene commit max seq# #22851

bleskes · 2017-01-28T22:33:11Z

The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.

This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming startSeqNo as a filter.

Relates to #22484 & #10708

…q_no_should_be_below_global_check_point

The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery. This commit adds a test that creates situation where rollback is needed (primary fail over with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter. Relates to elastic#22484 & #elastic#10708

bleskes · 2017-01-29T10:57:51Z

test this please

bleskes · 2017-01-29T19:34:52Z

Test this please

ywelsch

Left some minor comments but change LGTM

ywelsch · 2017-01-31T10:53:02Z

core/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

-            return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1;
+            final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint);
+            if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) {
+                assert seqNoStats.getLocalCheckpoint() <= seqNoStats.getGlobalCheckpoint() :


this would be a consequence of maxSeqNo >= localCheckpoint. Should we instead add the assertion maxSeqNo >= localCheckpoint to the constructor of SeqNoStats?

yep. moved.

ywelsch · 2017-01-31T10:58:42Z

core/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

@@ -365,7 +365,14 @@ private StartRecoveryRequest getStartRecoveryRequest(final RecoveryTarget recove
    public static long getStartingSeqNo(final RecoveryTarget recoveryTarget) {
        try {
            final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.indexShard().shardPath().resolveTranslog());
-            return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1;
+            final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint);
+            if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) {


can you add a comment here why we check this here (i.e. essentially the first paragraph of the PR description).

ywelsch · 2017-01-31T11:00:24Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

        this.indexName = this.request.shardId().getIndex().getName();
        this.shardId = this.request.shardId().id();
+        this.logger = Loggers.getLogger(getClass(), nodeSettings,  request.shardId(),"recover to " + request.targetNode().getName());


space missing

ywelsch · 2017-01-31T11:09:56Z

core/src/test/java/org/elasticsearch/discovery/DiscoveryWithServiceDisruptionsIT.java

@@ -492,6 +493,7 @@ public void testAckedIndexing() throws Exception {
            .setSettings(Settings.builder()
                .put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1 + randomInt(2))
                .put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, randomInt(2))
+                .put(IndexSettings.INDEX_SEQ_NO_CHECKPOINT_SYNC_INTERVAL.getKey(), "200ms")


randomize this a bit? Maybe we won't uncover other issues if this is too low?

I added randomization between 5s and 200ms

ywelsch · 2017-01-31T11:16:09Z

test/framework/src/main/java/org/elasticsearch/cluster/routing/ShardRoutingHelper.java

@@ -50,6 +50,12 @@ public static ShardRouting reinitPrimary(ShardRouting routing) {
        return routing.reinitializePrimaryShard();
    }

+    public static ShardRouting promoteToPrimary(ShardRouting routing) {
+        return new ShardRouting(routing.shardId(), routing.currentNodeId(), routing.relocatingNodeId(), true, routing.state(),


why not use ShardRouting.moveActiveReplicaToPrimary?

Because I didn't find it. Thanks for the tip.

ywelsch · 2017-01-31T13:30:18Z

core/src/test/java/org/elasticsearch/index/replication/RecoveryDuringReplicationTests.java

+                // simulate docs that were inflight when primary failed, these will be rolled back
+                final int rollbackDocs = randomIntBetween(1, 5);
+                logger.info("--> indexing {} rollback docs", rollbackDocs);
+                for (int i = 0; i< rollbackDocs; i++) {


ywelsch · 2017-01-31T13:36:39Z

core/src/test/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetServiceTests.java

+            replica.updateGlobalCheckpointOnReplica(maxSeqNo - 1);
+            replica.getTranslog().sync();
+
+            // commit is enough, global checkpoint is bellow max *committed* which is NO_OPS_PERFORMED


ywelsch · 2017-01-31T13:37:14Z

core/src/test/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetServiceTests.java

+
+            replica.updateGlobalCheckpointOnReplica(maxSeqNo);
+            replica.getTranslog().sync();
+            // commit is enough, global checkpoint is bellow max


…q_no_should_be_below_global_check_point

… sync

bleskes · 2017-01-31T19:27:40Z

thx @ywelsch

bleskes added 8 commits January 27, 2017 23:12

initial fix, test to come

632e1c1

Merge remote-tracking branch 'upstream/master' into recovery_start_se…

4af03e7

…q_no_should_be_below_global_check_point

add await fix back to EvilPeerRecoveryIT

2d9a5fe

remove unneeded space

c468ab8

extra logging fix

c94a918

improved log syntax

b86b5c8

add testGetStartingSeqNo

41556b4

bleskes added :Sequence IDs >bug labels Jan 28, 2017

bleskes requested a review from ywelsch January 28, 2017 22:33

fix naming

a179934

ywelsch approved these changes Jan 31, 2017

View reviewed changes

bleskes added 6 commits January 31, 2017 17:31

Merge remote-tracking branch 'upstream/master' into recovery_start_se…

e93fd35

…q_no_should_be_below_global_check_point

move assertions for seq no stats

1a9d5bf

add a comment to getStartingSeqNo

aca736a

feedback

b6cabef

roll back assertion on global checkpoint

63fb5ca

sync stats creation to make sure maxSeqNo and local checkpoint are in…

0aeede1

… sync

bleskes merged commit eb36b82 into elastic:master Jan 31, 2017

bleskes deleted the recovery_start_seq_no_should_be_below_global_check_point branch January 31, 2017 19:27

clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seq Number based recovery should validate last lucene commit max seq# #22851

Seq Number based recovery should validate last lucene commit max seq# #22851

bleskes commented Jan 28, 2017

bleskes commented Jan 29, 2017

bleskes commented Jan 29, 2017

ywelsch left a comment

ywelsch Jan 31, 2017

bleskes Jan 31, 2017

ywelsch Jan 31, 2017

bleskes Jan 31, 2017

ywelsch Jan 31, 2017

ywelsch Jan 31, 2017

bleskes Jan 31, 2017

ywelsch Jan 31, 2017

bleskes Jan 31, 2017

ywelsch Jan 31, 2017

ywelsch Jan 31, 2017

bleskes Jan 31, 2017

ywelsch Jan 31, 2017

bleskes commented Jan 31, 2017

Seq Number based recovery should validate last lucene commit max seq# #22851

Seq Number based recovery should validate last lucene commit max seq# #22851

Conversation

bleskes commented Jan 28, 2017

bleskes commented Jan 29, 2017

bleskes commented Jan 29, 2017

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Jan 31, 2017