resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146

viavi-ab · 2023-05-25T18:12:40Z

I have come across a situation where resize2fs corrupts an ext4 filesytem and I can reproduce it reliably.

I have captured an image file with "e2image -r" prior to the resize and have put together a script that reproduces the bug on 1.47.0 (I can't seem to upload the large file to github, so I have cloned this repo and committed my test files at https://github.com/viavi-ab/e2fsprogs/tree/master/ab_test).

The ext4 image that I'm resizing is from a working server that, prior to the resize, had no filesystem problems.

This is an offline resize (shrink) of a large (2TB) filesystem containing a single (very large) sparse file scattered across ~8000 extents (the original image was sanitized, all other files removed, and then zerofree'ed for better compression and to hide data that I'm not allowed to share).

My script goes like this:

e2fsck -f -p work.img
resize2fs work.img 1500000000k
e2fsck -f -p work.img

The filesystem passes the first e2fsck, resize2fs reports no problems, yet the second e2fsck complains (amongst other things) about:

Inode 12, end of extent exceeds allowed value
        (logical block 2507128, physical block 640097, len 1)

Inode 12 has an invalid extent
        (logical block 2507128, invalid physical block 389065080, len 1)

Inode 12, end of extent exceeds allowed value
        (logical block 2515444, physical block 648221, len 1)

Inode 12 has an invalid extent
        (logical block 2515444, invalid physical block 389073396, len 1)

Bug does not depend on the presence of other files or the amount of free space in my image, but seems to be related to the particular dispersion of extents of that particular file.

I think what is happening is that as resize2fs is relocating data blocks, it somehow trips on some of the extents resulting in the situation shown above (note how the two logical blocks are duplicated; also, the large physical block numbers were once within the limits of the filesystem, but after resize they have become invalid).

I may be wrong, but looking at the extents allocated to the file, it seems like it has two particular interior extents (level 1) that, after resize, end up with no leaf extents (level 2).

Bug was initially found on e2fsprogs 1.45.6 (stock ubuntu 22 package) but can be reproduced on fresh binaries compiled from the latest e2fsprogs source (1.47.0).

Rearranging the extents of the file inside the image seems to work around the problem; for example, any of these will allow resize2fs to do its work without corrupting the filesystem:

Mount the image, duplicate the file, overwrite the original ("cp file file.new && mv file.new file"). I believe this causes the extent tree of file to be replaced by a fresh one, probably tidier.
Run e2fsck with "-E optimize_extents" prior to resize2fs.

If the bug is in the way resize2fs handles the extents, however, neither of these completely rules out the possibility of corrupting the filesystem anyway.

The text was updated successfully, but these errors were encountered:

viavi-ab · 2023-05-25T18:20:17Z

The ext4 image is available at https://github.com/viavi-ab/e2fsprogs/blob/master/ab_test/sanitized_e2.img.zst (~90MB).

The script that reproduces the bug is sanitized_reproducer (requires unzstd to extract the raw image; uncompressed image is a 1.8TB sparse file).

Attached are the output of debugfs stat and extents commands before and after the resize:

ab_test.zip

antmat · 2023-05-30T14:49:02Z

Hi, I believe that this is connected to #145, I'm not a maintainer and don't have working solution, but maybe you could take a look at it.

viavi-ab · 2023-06-16T16:25:25Z

Hi @antmat, I think I've found the bug and am putting together a patch.
What’s the preferred way of submitting a patch for review?

antmat · 2023-06-21T13:47:11Z

Hi @antmat, I think I've found the bug and am putting together a patch. What’s the preferred way of submitting a patch for review?

Hi! Unfortunately I'm not a maintainer of this project - I'm just experiencing related issue. I believe @tytso could help. Maybe the right way is to apply patch on kernel mailing list - I don't know ¯_(ツ)_/¯

This patch modifies the logic of ext2fs_extent_set_bmap() so that: - Handling of "only block in extent" special case is removed - This affects files during a shrink with resize2fs; for example, when a file has an extent of 100 blocks and all 100 are relocated, the current (unpatched) code does this: - (ref.: inode_scan_and_fix() -> ext2fs_block_iterate3() -> process_block() which remaps each block and returns BLOCK_CHANGED to ext2fs_block_iterate3(), which then calls ext2fs_extent_set_bmap() to update the extent tree). - A new extent is inserted for the same lblk (because the pblk has moved), and the original extent is pushed back/shrunk by 1 block (start pblk is incremented by 1, e_len reduced by 1). - All other blocks in the extent (except the last one) are appended to the newly-inserted extent, and the original extent is pushed back/shrunk accordingly (one block at a time). - When it's time to process the last remaining block, the code notices an extent of e_len = 1 (the last one remaining with the old pblk) and "optimizes" this by rewriting it in place (instead of appending it to the new extent). - The end result is that *every* file that has an extent with e_len > 1 is *always* split into two extents (the second one consisting of a single block). - I think the intention here was to make the code more efficient by making the update in place, but the way resize2fs calls this function practically causes the number of extents to double up during a shrink; in my testcase a large file with ~8300 extents before the resize ends up with ~8200 after resize (compared with ~15k extents with the unpatched code). - More preference is given to growing the previous extent's end, by swapping two if blocks - This helps compacting the extents during resize2fs (swapping the blocks is necessary because of the removal above). - When an existing extent is shrunk and its e_len becomes zero, it is deleted. - I am not clear if an extent with length == 0 is valid; judging from https://www.mail-archive.com/ptxdist@pengutronix.de/msg20364.html, probably not. - Without this change, "extent.e_len += (extent.e_lblk - start)" in ext2fs_extent_fix_parents() may overflow and throw the end of the extent way outside the size of the filesystem. Check with the e2 image referenced in this bug. In summary, this patch fixes tytso#146 and optimizes the allocation of extents after a shrinking resize2fs. Signed-off-by: Andrea Biardi <andrea.biardi@viavisolutions.com>

viavi-ab · 2023-06-23T11:50:24Z

Apologies @antmat , I thought you were involved in the project. I emailed @tytso directly.
In the meantime I've pushed my patch to https://github.com/viavi-ab/e2fsprogs/tree/master
It solves my problem with #146, perhaps you can check if it solves your #145 too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146

resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146

viavi-ab commented May 25, 2023 •

edited

Loading

viavi-ab commented May 25, 2023

antmat commented May 30, 2023

viavi-ab commented Jun 16, 2023

antmat commented Jun 21, 2023

viavi-ab commented Jun 23, 2023

resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146

resize2fs corrupts extents on ext4 filesystem (offline shrinking) #146

Comments

viavi-ab commented May 25, 2023 • edited Loading

viavi-ab commented May 25, 2023

antmat commented May 30, 2023

viavi-ab commented Jun 16, 2023

antmat commented Jun 21, 2023

viavi-ab commented Jun 23, 2023

viavi-ab commented May 25, 2023 •

edited

Loading