Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.x] Add support for peer recoveries using snapshots after primary failovers #79137

Merged
merged 2 commits into from
Oct 14, 2021

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Oct 14, 2021

This commit adds support for peer recoveries using snapshots after
a primary failover if the snapshot shares the same logical contents
but the physical files are different. It uses the seq no information
stored in the snapshot to compare against the current shard source
node seq nos and decide whether or not it can use the snapshot to
recover the shard. Since the underlying index files are different
to the source index files, error handling is different than when
the files are shared. In this case, if there's an error while
snapshots files are recovered, we have to cancel the on-going
downloads, wait until all in-flight operations complete, remove
the recovered files and start from scratch using a fallback
recovery plan that uses the files from the source node.

Relates #73496
Backport of #77420

This commit adds support for peer recoveries using snapshots after
a primary failover if the snapshot shares the same logical contents
but the physical files are different. It uses the seq no information
stored in the snapshot to compare against the current shard source
node seq nos and decide whether or not it can use the snapshot to
recover the shard. Since the underlying index files are different
to the source index files, error handling is different than when
the files are shared. In this case, if there's an error while
snapshots files are recovered, we have to cancel the on-going
downloads, wait until all in-flight operations complete, remove
the recovered files and start from scratch using a fallback
recovery plan that uses the files from the source node.

Relates elastic#73496
Backport of elastic#77420
@fcofdez fcofdez merged commit 115d681 into elastic:7.x Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant