Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq Number based recovery should validate last lucene commit max seq# #22851

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jan 28, 2017

The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.

This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming startSeqNo as a filter.

Relates to #22484 & #10708

The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.

 This commit adds a test that creates situation where rollback is needed (primary fail over with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter.

 Relates to elastic#22484 & #elastic#10708
@bleskes
Copy link
Contributor Author

bleskes commented Jan 29, 2017

test this please

@bleskes
Copy link
Contributor Author

bleskes commented Jan 29, 2017

Test this please

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but change LGTM

return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1;
final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint);
if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) {
assert seqNoStats.getLocalCheckpoint() <= seqNoStats.getGlobalCheckpoint() :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a consequence of maxSeqNo >= localCheckpoint. Should we instead add the assertion maxSeqNo >= localCheckpoint to the constructor of SeqNoStats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. moved.

@@ -365,7 +365,14 @@ private StartRecoveryRequest getStartRecoveryRequest(final RecoveryTarget recove
public static long getStartingSeqNo(final RecoveryTarget recoveryTarget) {
try {
final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.indexShard().shardPath().resolveTranslog());
return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1;
final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint);
if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here why we check this here (i.e. essentially the first paragraph of the PR description).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure added

this.indexName = this.request.shardId().getIndex().getName();
this.shardId = this.request.shardId().id();
this.logger = Loggers.getLogger(getClass(), nodeSettings, request.shardId(),"recover to " + request.targetNode().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing

@@ -492,6 +493,7 @@ public void testAckedIndexing() throws Exception {
.setSettings(Settings.builder()
.put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1 + randomInt(2))
.put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, randomInt(2))
.put(IndexSettings.INDEX_SEQ_NO_CHECKPOINT_SYNC_INTERVAL.getKey(), "200ms")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randomize this a bit? Maybe we won't uncover other issues if this is too low?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added randomization between 5s and 200ms

@@ -50,6 +50,12 @@ public static ShardRouting reinitPrimary(ShardRouting routing) {
return routing.reinitializePrimaryShard();
}

public static ShardRouting promoteToPrimary(ShardRouting routing) {
return new ShardRouting(routing.shardId(), routing.currentNodeId(), routing.relocatingNodeId(), true, routing.state(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use ShardRouting.moveActiveReplicaToPrimary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I didn't find it. Thanks for the tip.

// simulate docs that were inflight when primary failed, these will be rolled back
final int rollbackDocs = randomIntBetween(1, 5);
logger.info("--> indexing {} rollback docs", rollbackDocs);
for (int i = 0; i< rollbackDocs; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces

replica.updateGlobalCheckpointOnReplica(maxSeqNo - 1);
replica.getTranslog().sync();

// commit is enough, global checkpoint is bellow max *committed* which is NO_OPS_PERFORMED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx.


replica.updateGlobalCheckpointOnReplica(maxSeqNo);
replica.getTranslog().sync();
// commit is enough, global checkpoint is bellow max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

@bleskes bleskes merged commit eb36b82 into elastic:master Jan 31, 2017
@bleskes bleskes deleted the recovery_start_seq_no_should_be_below_global_check_point branch January 31, 2017 19:27
@bleskes
Copy link
Contributor Author

bleskes commented Jan 31, 2017

thx @ywelsch

@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants