-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill missing sequence IDs up to max sequence ID when recovering from store #24238
Conversation
…store Today we might promote a primary and recover from store where after translog recovery the local checkpoint is still behind the maximum sequence ID seen. To fill the holes in the sequence ID history this PR adds a utility method that fills up all missing sequence IDs up to the maximum seen sequence ID with no-ops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I one minor request to the test.
final long maxSeqId = seqNoService.getMaxSeqNo(); | ||
int numNoOpsAdded = 0; | ||
for (long i = localCheckpoint + 1; i <= maxSeqId; | ||
// the local checkpoint might have been advanced so we are leap-frogging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the local checkpoint must have advanced by at least one. We can assert on that after the noop was indexed.
Engine.Index primaryResponse = indexForDoc(doc); | ||
Engine.IndexResult indexResult = engine.index(primaryResponse); | ||
if (randomBoolean()) { | ||
doc.updateSeqID(indexResult.getSeqNo(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? doesn't the engine take care of that?
assertEquals((maxSeqIDOnReplica+1) - numDocsOnReplica, recoveringEngine.fillSequenceNumberHistory(2)); | ||
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getMaxSeqNo()); | ||
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getLocalCheckpoint()); | ||
if ((flushed = randomBoolean())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we snapshot the translog and assert that the noops have the right primary term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I had that but remvoed it... good catch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM. Left a suggestion for the new test.
// start a replica shard and index the second doc | ||
final IndexShard otherShard = newStartedShard(false); | ||
test = otherShard.prepareIndexOnReplica( | ||
SourceToParse.source(SourceToParse.Origin.PRIMARY, shard.shardId().getIndexName(), test.type(), test.id(), test.source(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Origin should REPLICA
@@ -896,6 +898,46 @@ public void testRecoverFromStore() throws IOException { | |||
closeShards(newShard); | |||
} | |||
|
|||
/* This test just verifies that we fill up local checkpoint up to max seen seqID on primary recovery */ | |||
public void testRecoverFromStoreWithNoOps() throws IOException { | |||
final IndexShard shard = newStartedShard(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can introduce a variant of indexDoc
called indexDocOnReplica
which takes a seq# as a parameter. This will remove the need for the extra shard. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in a sep PR
…store (elastic#24238) Today we might promote a primary and recover from store where after translog recovery the local checkpoint is still behind the maximum sequence ID seen. To fill the holes in the sequence ID history this PR adds a utility method that fills up all missing sequence IDs up to the maximum seen sequence ID with no-ops. Relates to elastic#10708
Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.
Relates to #10708
I still work on a test for store recovery to ensure it's called but I think it's ready for review.