Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ZFS in combination with suspend to disk may cause filesystem corruption #106093

Closed
MarcFontaine opened this issue Dec 6, 2020 · 10 comments · Fixed by #171680
Closed

Using ZFS in combination with suspend to disk may cause filesystem corruption #106093

MarcFontaine opened this issue Dec 6, 2020 · 10 comments · Fixed by #171680

Comments

@MarcFontaine
Copy link
Contributor

MarcFontaine commented Dec 6, 2020

Describe the bug
Suspend to disk / hibernation in combination with ZFS may lead to filesystem corruption.

To Reproduce
The problem only shows up occasionally about once on every 30 startups.
I had been using suspend to disk on my laptop without problem for several years.
After changing from BTRFS to ZFS the import of the ZFS pool occasionally stalls when resuming from
hibernate. A message says that the pool is corrupted and gets reconstructed.
Once the pool was corrupted irreparably.

Root cause
ZFS does not support filesystem freeze and thaw under Linux at the moment.
(see openzfs/zfs#260)

When going into hibernation the pool is not exported/unmounted.
(The system is not properly shut down at all and this is same situation for all filesystems).

The problem is that when resuming from hibernation the initrd tries to import the pool
(it also tries to fix a corrupted pool and effectively modifies the pool/partition).
This happens before checks for a possible resume from hibernation
and before the system returns to the state in which it had been suspended.

This means that after returning from suspend to disk
the partition with the zpool is no longer in the state
it was when the system was suspended.

Suspend to disk is a kind of hack which leaves (or may leave)
the filesystem on the hard disk in an inconsistent state.
Discarding the suspended system state or mounting the file system from an other installation may cause problems.
On the other hand before I had changed to ZFS I did not experience any problems
and this filesystem corruption caught me without warning.

Expected behavior
It may be possible to fix the problem by reordering the sequence of actions in initrd.
Unless there is a fix for this NixOS should disable the combination of ZFS and hibernation.
(More precisely, initrd should either contain the setup scripts for ZFS or the scripts for
return-from-hibernate).
At least there should be a warning.

@8573
Copy link
Contributor

8573 commented Dec 6, 2020

BTRFS is native to Linux, so its suspension mechanism and Linux's were (presumably) always set up to work with each other. On the other hand, ZFS's suspension code has not yet been hooked up to Linux's properly: openzfs/zfs#260.

I agree that—

[…] NixOS should disable the combination of ZFS and hibernation. […] At least there should be a warning.

One potential problem seems to be that there seem to be multiple packages that can cause suspension. In my NixOS configuration, I tell logind and GDM not to suspend, but I don't know whether that's everything that might cause suspension.

@MarcFontaine
Copy link
Contributor Author

@8573 Thanks for adding the link.

I've added a corresponding line of warning here: https://nixos.wiki/wiki/NixOS_on_ZFS

@hmenke
Copy link
Member

hmenke commented Dec 13, 2020

Do you suspend to disk with a zvol swap? Because I'm running with root on ZFS on my laptop and I have never encountered problems with suspend to disk, but I'm also using a separate regular swap partition.

@8573
Copy link
Contributor

8573 commented Dec 13, 2020

I for one use a regular swap partition. I recall hearing that swap-on-ZVOL is dangerous independently of hibernation.

@stale
Copy link

stale bot commented Jun 16, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 16, 2021
@KenMacD
Copy link
Contributor

KenMacD commented Jul 2, 2021

Would it make sense to automatically add boot.kernelParams = [ "nohibernate" ]; in cases where zfs is in use?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 2, 2021
@bryanasdev000
Copy link
Member

bryanasdev000 commented Jul 15, 2021

Would it make sense to automatically add boot.kernelParams = [ "nohibernate" ]; in cases where zfs is in use?

This. A note in wiki or manual will be good too.

EDIT: Done (https://nixos.wiki/wiki/NixOS_on_ZFS).

@stale
Copy link

stale bot commented Apr 29, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 29, 2022
@bryanasdev000
Copy link
Member

Let's see if we can close this with #171680

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label May 5, 2022
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/filesystem-recommendations/28486/14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants