-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Never delete translog-N.tlog file when creation fails #15788
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We today delete the translog-N.tlog file if any subsequent operation fails but we might actually be in a good state if for instance the creation of the writer failes after we sucessfully baked the new translog generation into the checkpoint. In this situation we used to delete the translog-N.tlog file and failed on the next recovery of the translog with a NoSuchFileException | FileNotFoundException just like in https://discuss.elastic.co/t/cannot-recover-index-because-of-missing-tanslog-files/38336 This commit changes the behavior and cleans up that limbo state on recovery if we already have a generation+1 file written but not baked into the checkpoint we remove that file but only if the previous ckp file has already been renamed otherwise we know we can't be in this state.
assert Files.exists(nextTranslogFile) == false || Files.size(nextTranslogFile) <= TranslogWriter.getHeaderLength(translogUUID); | ||
if (Files.exists(currentCheckpointFile) // current checkpoint is already copied | ||
&& Files.deleteIfExists(nextTranslogFile)) { // delete it and log a warning | ||
logger.warn("Deleted invalid next generation before opening writer {} this is like caused by some previously tragic exception ", nextTranslogFile.getFileName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we soften this message? maybe "deleted previously created, but not yet committed, next generation [{}]. This can happen due to a tragic exception when creating a new generation"
LGTM. Left some minor comments. Great catch. |
s1monw
added a commit
that referenced
this pull request
Jan 6, 2016
Never delete translog-N.tlog file when creation fails
s1monw
added a commit
to s1monw/elasticsearch
that referenced
this pull request
Jan 6, 2016
Never delete translog-N.tlog file when creation fails
s1monw
added a commit
that referenced
this pull request
Jan 6, 2016
Never delete translog-N.tlog file when creation fails
s1monw
added a commit
that referenced
this pull request
Jan 6, 2016
Never delete translog-N.tlog file when creation fails
s1monw
added a commit
that referenced
this pull request
Jan 6, 2016
Never delete translog-N.tlog file when creation fails
s1monw
added a commit
to s1monw/elasticsearch
that referenced
this pull request
Jan 7, 2016
This commit adds a new test that can throw an IOException at any point in time and ensures that all previously synced documents can be successfully recovered after hitting an excepiton. Relates to elastic#15788
s1monw
added a commit
that referenced
this pull request
Jan 7, 2016
This commit adds a new test that can throw an IOException at any point in time and ensures that all previously synced documents can be successfully recovered after hitting an excepiton. Relates to #15788
s1monw
added a commit
that referenced
this pull request
Jan 7, 2016
This commit adds a new test that can throw an IOException at any point in time and ensures that all previously synced documents can be successfully recovered after hitting an excepiton. Relates to #15788
s1monw
added a commit
that referenced
this pull request
Jan 7, 2016
This commit adds a new test that can throw an IOException at any point in time and ensures that all previously synced documents can be successfully recovered after hitting an excepiton. Relates to #15788
s1monw
added a commit
that referenced
this pull request
Jan 7, 2016
This commit adds a new test that can throw an IOException at any point in time and ensures that all previously synced documents can be successfully recovered after hitting an excepiton. Relates to #15788
clintongormley
added
:Distributed/Distributed
A catch all label for anything in the Distributed Area. If you aren't sure, use this one.
:Distributed/Engine
Anything around managing Lucene and the Translog in an open shard.
and removed
:Translog
:Distributed/Distributed
A catch all label for anything in the Distributed Area. If you aren't sure, use this one.
labels
Feb 13, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
blocker
>bug
:Distributed/Engine
Anything around managing Lucene and the Translog in an open shard.
v2.0.3
v2.1.2
v2.2.0
v2.3.0
v5.0.0-alpha1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We today delete the translog-N.tlog file if any subsequent operation fails
but we might actually be in a good state if for instance the creation of the writer
failes after we sucessfully baked the new translog generation into the checkpoint. In this situation
we used to delete the translog-N.tlog file and failed on the next recovery of the translog with a
NoSuchFileException | FileNotFoundException just like in https://discuss.elastic.co/t/cannot-recover-index-because-of-missing-tanslog-files/38336
This commit changes the behavior and cleans up that limbo state on recovery if we already have a generation+1 file written but not baked into
the checkpoint we remove that file but only if the previous ckp file has already been renamed otherwise we know we can't be in this state.