-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq Number based recovery should validate last lucene commit max seq# #22851
Seq Number based recovery should validate last lucene commit max seq# #22851
Conversation
…q_no_should_be_below_global_check_point
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery. This commit adds a test that creates situation where rollback is needed (primary fail over with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter. Relates to elastic#22484 & #elastic#10708
test this please |
Test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments but change LGTM
return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | ||
final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | ||
if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { | ||
assert seqNoStats.getLocalCheckpoint() <= seqNoStats.getGlobalCheckpoint() : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a consequence of maxSeqNo >= localCheckpoint
. Should we instead add the assertion maxSeqNo >= localCheckpoint
to the constructor of SeqNoStats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. moved.
@@ -365,7 +365,14 @@ private StartRecoveryRequest getStartRecoveryRequest(final RecoveryTarget recove | |||
public static long getStartingSeqNo(final RecoveryTarget recoveryTarget) { | |||
try { | |||
final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.indexShard().shardPath().resolveTranslog()); | |||
return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | |||
final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | |||
if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here why we check this here (i.e. essentially the first paragraph of the PR description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure added
this.indexName = this.request.shardId().getIndex().getName(); | ||
this.shardId = this.request.shardId().id(); | ||
this.logger = Loggers.getLogger(getClass(), nodeSettings, request.shardId(),"recover to " + request.targetNode().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing
@@ -492,6 +493,7 @@ public void testAckedIndexing() throws Exception { | |||
.setSettings(Settings.builder() | |||
.put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1 + randomInt(2)) | |||
.put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, randomInt(2)) | |||
.put(IndexSettings.INDEX_SEQ_NO_CHECKPOINT_SYNC_INTERVAL.getKey(), "200ms") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randomize this a bit? Maybe we won't uncover other issues if this is too low?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added randomization between 5s and 200ms
@@ -50,6 +50,12 @@ public static ShardRouting reinitPrimary(ShardRouting routing) { | |||
return routing.reinitializePrimaryShard(); | |||
} | |||
|
|||
public static ShardRouting promoteToPrimary(ShardRouting routing) { | |||
return new ShardRouting(routing.shardId(), routing.currentNodeId(), routing.relocatingNodeId(), true, routing.state(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use ShardRouting.moveActiveReplicaToPrimary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I didn't find it. Thanks for the tip.
// simulate docs that were inflight when primary failed, these will be rolled back | ||
final int rollbackDocs = randomIntBetween(1, 5); | ||
logger.info("--> indexing {} rollback docs", rollbackDocs); | ||
for (int i = 0; i< rollbackDocs; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces
replica.updateGlobalCheckpointOnReplica(maxSeqNo - 1); | ||
replica.getTranslog().sync(); | ||
|
||
// commit is enough, global checkpoint is bellow max *committed* which is NO_OPS_PERFORMED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx.
|
||
replica.updateGlobalCheckpointOnReplica(maxSeqNo); | ||
replica.getTranslog().sync(); | ||
// commit is enough, global checkpoint is bellow max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
…q_no_should_be_below_global_check_point
thx @ywelsch |
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.
This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming
startSeqNo
as a filter.Relates to #22484 & #10708