Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC: zfs: adding existent segment to range tree (offset=11f694000 size=7000) and pool is corrupted after reboot #15619

Closed
mtippmann opened this issue Dec 1, 2023 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@mtippmann
Copy link

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version rolling
Kernel Version 6.6.3-arch1-1
Architecture amd64
OpenZFS Version zfs-2.2.99-241_g3e4bef52b0 / zfs-kmod-2.2.99-241_g3e4bef52b0 - git as of 01.12.23

Describe the problem you're observing

get this oops when compiling openwrt on a pool running current git with

zfs_bclone_enabled=1
zfs_dmu_offset_next_sync=1 
Dec 01 12:50:47 futro2 kernel: PANIC: zfs: adding existent segment to range tree (offset=11f694000 size=7000)
Dec 01 12:50:47 futro2 kernel: Showing stack for process 288
Dec 01 12:50:47 futro2 kernel: CPU: 3 PID: 288 Comm: txg_sync Tainted: P     U     OE      6.6.3-arch1-1 #1 6156c717f7d423f5954ce718462aaaaa43b9110d
Dec 01 12:50:47 futro2 kernel: Hardware name: FUJITSU FUTRO S740/D3544-A1, BIOS V5.0.0.13 R1.13.0 for D3544-A1x                    09/23/2022
Dec 01 12:50:47 futro2 kernel: Call Trace:
Dec 01 12:50:47 futro2 kernel:  <TASK>
Dec 01 12:50:47 futro2 kernel:  dump_stack_lvl+0x47/0x60
Dec 01 12:50:47 futro2 kernel:  vcmn_err+0xdf/0x120 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  zfs_panic_recover+0x79/0xa0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  range_tree_add_impl+0x28f/0xea0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_range_tree_add+0x10/0x10 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  range_tree_vacate+0x85/0x230 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  metaslab_sync_done+0x149/0x540 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  vdev_sync_done+0x3a/0x90 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  spa_sync+0x893/0x1070 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  txg_sync_thread+0x1fe/0x3a0 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs 90d504f36e61841082f23aea7ae276b260ab21d6]
Dec 01 12:50:47 futro2 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  thread_generic_wrapper+0x5b/0x70 [spl 8e72ae35b64a0f5a2b6fea420c9c9e09f33fc00d]
Dec 01 12:50:47 futro2 kernel:  kthread+0xe5/0x120
Dec 01 12:50:47 futro2 kernel:  ? __pfx_kthread+0x10/0x10
Dec 01 12:50:47 futro2 kernel:  ret_from_fork+0x31/0x50
Dec 01 12:50:47 futro2 kernel:  ? __pfx_kthread+0x10/0x10
Dec 01 12:50:47 futro2 kernel:  ret_from_fork_asm+0x1b/0x30
Dec 01 12:50:47 futro2 kernel:  </TASK>

IO hangs and after reboot the pool can't be imported anymore:

IMG_20231201_130329

Describe how to reproduce the problem

This is unfortunatly somewhat tricky - it happens during kernel build when the vsdo library is generated this is done via a c-program - i've already detailled all the steps in #15513 (comment) but this appears to be a slightly different bug. Also #15485 looks similiar?.

I can reproduce it reliable by building OpenWrt:

$ git clone https://github.com/openwrt/openwrt
$ cd openwrt 
$ ./scripts/feeds update -a && ./scripts/feeds install -a 
$ make defconfig
$ make -j$(nproc) 
...
machine hangs 
...

unfortunatly I still haven't figured out how to isolate the vdso generation - but build OpenWrt until the bug is triggered doesn't take that long - requirements for the build are documented here: https://openwrt.org/docs/guide-developer/toolchain/install-buildsystem#linux_gnu-linux_distributions

@mtippmann mtippmann added the Type: Defect Incorrect behavior (e.g. crash, hang) label Dec 1, 2023
@AllKind
Copy link
Contributor

AllKind commented Dec 1, 2023

I guess it would be interesting to know, if it also happens with block cloning disabled.

@mtippmann
Copy link
Author

mtippmann commented Dec 1, 2023

I guess it would be interesting to know, if it also happens with block cloning disabled.

I broke the pool (assumed that wouldn't happen) have to reinstall and this time don't boot from zfs - I need to verify but from testing with git before the recent changes (see the comments i've linked in the ticket) it doesn't happen with block-cloning deactivated at the pool level and it also doesn't happen with zfs_dmu_offset_next_sync=0 and block cloning activated. I need to investigate a little bit more once I got the machine running again. The code in question does some mmap() things:

 addr = mmap(NULL, stat.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

maybe it's possible to come up with a small reproducer but I'm afraid i'm unable to do so at the moment :/

@KungFuJesus
Copy link

KungFuJesus commented Dec 11, 2023

@mtippmann I assume by "code" here you mean what's being executed is mmaping, not the compilation unit (which would have nothing to do with this). If that's the case, it may be the existing issue filed in regards to mmap:

#15656

Just speculation, so far I think the only behavior found was an assertion being tripped (and in a different place).

@mtippmann
Copy link
Author

this is fixed with latest master and works with block cloning enabled! thanks anyone who worked on fixing this!

zfs-2.2.99-310_ga0b2a93c41
zfs-kmod-2.2.99-310_ga0b2a93c41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants