Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaslabs recovery tool #13995

Open
CaCTuCaTu4ECKuu opened this issue Oct 4, 2022 · 7 comments
Open

Metaslabs recovery tool #13995

CaCTuCaTu4ECKuu opened this issue Oct 4, 2022 · 7 comments
Labels
Type: Feature Feature request or new feature

Comments

@CaCTuCaTu4ECKuu
Copy link

I suggest to add some tools that can recover (recreate) metaslabs data for pool
When importing pool and facing metaslab related problem it would be nice to have import option related to it.

It's burdensome to fix exact problem for every case so recreating data would be simpliest way.

There is few recent issues that target problem with metaslabs #13963 and #13483
Suggested solution is basically to recreate pool and recover from backup, while readonly mode works well for such problems it's definetely a trouble.

I'm not actually sure if I understand enough what is metaslabs and how it works, but from my understanding it's data can be recovered as long as pool disks can be read, otherwise this issue is useless.

Guy here managed to get hiw pool up in RW mode using some tunables, in that case it would be better to have recovery utility as separate tool rather than part of zpool import so it would be an option as well

@CaCTuCaTu4ECKuu CaCTuCaTu4ECKuu added the Type: Feature Feature request or new feature label Oct 4, 2022
@GregorKopka
Copy link
Contributor

Metaslabs do the 'free space' accounting for the pool.

It should be no big problem (apart from memory constrains on really big pools) to extend the scrub code (which walks each and all allocated space in the pool) to collect the needed data to check/recreate the on-disk metaslabs.

@CaCTuCaTu4ECKuu
Copy link
Author

As far as there's a way to import pool with such problem and run scrub this sound like a perfect solution.
Scrub is first thing you run anyway so there will be no need to wait undetermined amount of time and hope it will fix itself

@shodanshok
Copy link
Contributor

@GregorKopka correct me if I am wrong, but if a metaslab is corrupted data-at-rest can be altered by writes on non-free space right? I mean, if a new write arrives and it is stored on used-but-incorrectly-accounted-as-free space, original data is lost but even if a subsequent scrub will find no issue (ie: no checksum error). Am I missing something?

@jumbi77
Copy link
Contributor

jumbi77 commented Oct 24, 2022

Not direclty related but #4186 and #8099 (in comments) wanted to extend the code for better recovery in case of problems. Just fly over PRs so maybe they do not be appicable anymore.

@GregorKopka
Copy link
Contributor

@GregorKopka correct me if I am wrong, but if a metaslab is corrupted data-at-rest can be altered by writes on non-free space right? I mean, if a new write arrives and it is stored on used-but-incorrectly-accounted-as-free space, original data is lost but even if a subsequent scrub will find no issue (ie: no checksum error). Am I missing something?

Yes, should metaslabs get corrupted then allocated space could be falsely reported as free and data-at-rest be overwritten.
But scrub should report an error in the old (now overwritten) data, as the checksum no longer matches.

Question is: how does this corruption happen? In case of on-disk I would expect a pool import to fail because of checksum mismatch in the on-disk representation. In case of logical corruption (whatever reason) I would expect either a panic or (worse) silent undefined behavior. It would be good if there would be a way to recover from both scenarios that doesn't involve recreate from backup, especially as a mechanic for this would also add checking for that kind of corruption in the pool metadata as a side-effect.

@satmandu
Copy link
Contributor

Has there been any thought to creating this tool or adding such functionality to scrub? It's still being seen in 2023... #15030

@CaCTuCaTu4ECKuu
Copy link
Author

@satmandu I recall someone said about adding smth in the future but I guess priority for this would be very low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

5 participants