Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC: zfs: adding existent segment to range tree #15030

Open
reefland opened this issue Jul 3, 2023 · 13 comments
Open

PANIC: zfs: adding existent segment to range tree #15030

reefland opened this issue Jul 3, 2023 · 13 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@reefland
Copy link

reefland commented Jul 3, 2023

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04.1
Kernel Version 5.15.0-75-generic
Architecture x86_64
OpenZFS Version 2.1.5

Describe the problem you're observing

Randomly (not at boot or pool import) system shows panic messages (see below) on the console. The system seemed to be operational. Upon trying to reboot I was unable to get a clean shutdown. Had several Failed unmounting messages with references to /root and /var/log. Along with a failed to start Journal Service. Eventually it hung at a systemd-shutdown[1]: Syncing filesystems and block devices - timed out, issuing SIGKILL to PID ... Had to kill the power after waiting ~20 minutes.

Upon reboot, everything seems fine. No issues with pool import, zpool status was good. After an hour or so I would see the first panic message. Usually within 24 to 48 hours, had to reboot again.. it would not be a clean shutdown, required manual power off.

The PANIC messages I have is very close to #13483 however, that discussion appears to be on pool import issues. I'm not having any issues with pool import so I'm documenting a new issue as the panic seems to have a different cause.

Include any warning/errors/backtraces from the system logs

Jul 02 16:09:43 k3s06 kernel: PANIC: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 16:09:43 k3s06 kernel: Showing stack for process 1111
Jul 02 16:09:43 k3s06 kernel: CPU: 10 PID: 1111 Comm: txg_sync Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 16:09:43 k3s06 kernel: Hardware name: ASRock B660M Steel Legend/B660M Steel Legend, BIOS 10.02 02/10/2023
Jul 02 16:09:43 k3s06 kernel: Call Trace:
Jul 02 16:09:43 k3s06 kernel:  <TASK>
Jul 02 16:09:43 k3s06 kernel:  show_stack+0x52/0x5c
Jul 02 16:09:43 k3s06 kernel:  dump_stack_lvl+0x4a/0x63
Jul 02 16:09:43 k3s06 kernel:  dump_stack+0x10/0x16
Jul 02 16:09:43 k3s06 kernel:  spl_dumpstack+0x29/0x2f [spl]
Jul 02 16:09:43 k3s06 kernel:  vcmn_err.cold+0x60/0x78 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 16:09:43 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:09:43 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 16:09:43 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 16:09:43 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 16:09:43 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 16:09:43 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 16:09:43 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 16:09:43 k3s06 kernel:  kthread+0x127/0x150
Jul 02 16:09:43 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 16:09:43 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 16:09:43 k3s06 kernel:  </TASK>


Jul 02 16:13:19 k3s06 kernel: INFO: task txg_sync:1111 blocked for more than 120 seconds.
Jul 02 16:13:19 k3s06 kernel:       Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 16:13:19 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 02 16:13:19 k3s06 kernel: task:txg_sync        state:D stack:    0 pid: 1111 ppid:     2 flags:0x00004004
Jul 02 16:13:19 k3s06 kernel: Call Trace:
Jul 02 16:13:19 k3s06 kernel:  <TASK>
Jul 02 16:13:19 k3s06 kernel:  __schedule+0x24e/0x590
Jul 02 16:13:19 k3s06 kernel:  schedule+0x69/0x110
Jul 02 16:13:19 k3s06 kernel:  vcmn_err.cold+0x76/0x78 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 16:13:19 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:13:19 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 16:13:19 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 16:13:19 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 16:13:19 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 16:13:19 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 16:13:19 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 16:13:19 k3s06 kernel:  kthread+0x127/0x150
Jul 02 16:13:19 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 16:13:19 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 16:13:19 k3s06 kernel:  </TASK>

NOTE: System is using ZFS Boot Menu, so this is ZFS on Root, but does not require the commonly used separate BOOT and ROOT pools. This method allows for a single ZFS pool (with encryption).

After reboot, status is good:

$ zpool status
  pool: k3s06
 state: ONLINE
  scan: scrub repaired 0B in 08:41:02 with 0 errors on Sun Jun 11 09:05:03 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        k3s06                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA109YVK-part3  ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA1040NK-part3  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VGJJ0Y5G-part3     ONLINE       0     0     0
            ata-WDC_WD120EFBX-68B0EN0_5QJ6U62B-part3    ONLINE       0     0     0
        cache
          nvme0n1                                       ONLINE       0     0     0

errors: No known data errors

I tried to issue a sudo zpool scrub k3s06 and it hanged (did not return to prompt). I open another SSH session, zpool status does not indicate a scrub is in progress. Checked the console and it had panic message scrolling on the screen.

Jul 02 19:45:26 k3s06 kernel: PANIC: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 19:45:26 k3s06 kernel: Showing stack for process 1341
Jul 02 19:45:26 k3s06 kernel: CPU: 2 PID: 1341 Comm: txg_sync Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 19:45:26 k3s06 kernel: Hardware name: ASRock B660M Steel Legend/B660M Steel Legend, BIOS 10.02 02/10/2023
Jul 02 19:45:26 k3s06 kernel: Call Trace:
Jul 02 19:45:26 k3s06 kernel:  <TASK>
Jul 02 19:45:26 k3s06 kernel:  show_stack+0x52/0x5c
Jul 02 19:45:26 k3s06 kernel:  dump_stack_lvl+0x4a/0x63
Jul 02 19:45:26 k3s06 kernel:  dump_stack+0x10/0x16
Jul 02 19:45:26 k3s06 kernel:  spl_dumpstack+0x29/0x2f [spl]
Jul 02 19:45:26 k3s06 kernel:  vcmn_err.cold+0x60/0x78 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 19:45:26 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? kmem_cache_free+0x24f/0x290
Jul 02 19:45:26 k3s06 kernel:  ? percpu_counter_dec+0x10/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? spl_kmem_cache_free+0xda/0x140 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:45:26 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 19:45:26 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 19:45:26 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 19:45:26 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 19:45:26 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 19:45:26 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 19:45:26 k3s06 kernel:  kthread+0x127/0x150
Jul 02 19:45:26 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 19:45:26 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 19:45:26 k3s06 kernel:  </TASK>
Jul 02 19:49:02 k3s06 kernel: INFO: task txg_sync:1341 blocked for more than 120 seconds.
Jul 02 19:49:02 k3s06 kernel:       Tainted: P           O      5.15.0-75-generic #82-Ubuntu
Jul 02 19:49:02 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 02 19:49:02 k3s06 kernel: task:txg_sync        state:D stack:    0 pid: 1341 ppid:     2 flags:0x00004004
Jul 02 19:49:02 k3s06 kernel: Call Trace:
Jul 02 19:49:02 k3s06 kernel:  <TASK>
Jul 02 19:49:02 k3s06 kernel:  __schedule+0x24e/0x590
Jul 02 19:49:02 k3s06 kernel:  schedule+0x69/0x110
Jul 02 19:49:02 k3s06 kernel:  vcmn_err.cold+0x76/0x78 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 02 19:49:02 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? kmem_cache_free+0x24f/0x290
Jul 02 19:49:02 k3s06 kernel:  ? percpu_counter_dec+0x10/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? spl_kmem_cache_free+0xda/0x140 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:49:02 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 02 19:49:02 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 02 19:49:02 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 02 19:49:02 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 02 19:49:02 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 02 19:49:02 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 02 19:49:02 k3s06 kernel:  kthread+0x127/0x150
Jul 02 19:49:02 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 02 19:49:02 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 02 19:49:02 k3s06 kernel:  </TASK>

I did another unclean shutdown again and rebooted. Tried some of the suggestions within #13483

echo 1 > /sys/module/zfs/parameters/zil_replay_disable
echo 1 > /sys/module/zfs/parameters/zfs_recover

Then issues another scrub, which completed successfully:

 zpool status
  pool: k3s06
 state: ONLINE
  scan: scrub repaired 0B in 09:07:12 with 0 errors on Mon Jul  3 05:54:45 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        k3s06                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA109YVK-part3  ONLINE       0     0     0
            ata-WDC_WD80EFZZ-68BTXN0_WD-CA1040NK-part3  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VGJJ0Y5G-part3     ONLINE       0     0     0
            ata-WDC_WD120EFBX-68B0EN0_5QJ6U62B-part3    ONLINE       0     0     0
        cache
          nvme0n1                                       ONLINE       0     0     0

errors: No known data errors

As expected from the tunables, panic messages now became warning messages:

Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279691ea000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27691830000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=276917fa000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279691bc000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d90833000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=276917fb000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d907b0000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=197224c4000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=291de47e000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28a52005000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2c13ab29000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2c118da9000 size=2000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27ecdfbe000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27341d0b000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2915ddb2000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28d4ea04000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=196d5824000 size=1000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2529b350000 size=f000)
Jul 02 21:01:47 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=279350b0000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=27ac475e000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2824d6d6000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=2824d072000 size=1000)
Jul 03 01:00:59 k3s06 kernel: WARNING: zfs: adding existent segment to range tree (offset=28afcbcf000 size=1000)

This seem to run clean as well:

sudo zdb -AAA -b k3s06

Traversing all blocks to verify nothing leaked ...

loading concrete vdev 1, metaslab 464 of 465 ...
8.91T completed (10118MB/s) estimated time remaining: 0hr 00min 00sec        
        No leaks (block sum matches space maps exactly)

        bp count:              68746233
        ganged count:                 0
        bp logical:       9955783857664      avg: 144819
        bp physical:      9789308441088      avg: 142397     compression:   1.02
        bp allocated:     9798470672384      avg: 142531     compression:   1.02
        bp deduped:                   0    ref>1:      0   deduplication:   1.00
        Normal class:     9798686769152     used: 61.46%
        Embedded log class         192512     used:  0.00%

        additional, non-pointer bps of type 0:      10841
        Dittoed blocks on same vdev: 77922

This time I was able to get a clean shutdown. Rebooted without the tunables.

System is now running 3+ hours under normal load, no panic messages have been seen since yet. Will keep monitoring.

@reefland reefland added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 3, 2023
@reefland
Copy link
Author

reefland commented Jul 4, 2023

Got about 12 hours from it, before it got wonky and panic again.

First few entries seem to be related to ext4, the system does have one zvol with ext4.

Jul 04 00:44:50 k3s06 kernel: INFO: task python3:584178 blocked for more than 120 seconds.
Jul 04 00:44:50 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:44:50 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:44:50 k3s06 kernel: task:python3         state:D stack:    0 pid:584178 ppid:579848 flags:0x00004000
Jul 04 00:44:50 k3s06 kernel: Call Trace:
Jul 04 00:44:50 k3s06 kernel:  <TASK>
Jul 04 00:44:50 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:44:50 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:44:50 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:44:50 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:44:50 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:44:50 k3s06 kernel:  ? dentry_lru_isolate+0xcf/0x140
Jul 04 00:44:50 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:44:50 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:44:50 k3s06 kernel:  __x64_sys_fdatasync+0x48/0x90
Jul 04 00:44:50 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:44:50 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:44:50 k3s06 kernel: RIP: 0033:0x7f03aa537fac
Jul 04 00:44:50 k3s06 kernel: RSP: 002b:00007f039defe4a8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 04 00:44:50 k3s06 kernel: RAX: ffffffffffffffda RBX: 000000000000004b RCX: 00007f03aa537fac
Jul 04 00:44:50 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
Jul 04 00:44:50 k3s06 kernel: RBP: 00007f039deffb38 R08: 0000000000000000 R09: 0000000000000000
Jul 04 00:44:50 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f02ec89a008
Jul 04 00:44:50 k3s06 kernel: R13: 0000000000000000 R14: 0000000000000408 R15: 0000000000000000
Jul 04 00:44:50 k3s06 kernel:  </TASK>
Jul 04 00:44:50 k3s06 kernel: INFO: task node:615723 blocked for more than 120 seconds.
Jul 04 00:44:50 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:44:50 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:44:50 k3s06 kernel: task:node            state:D stack:    0 pid:615723 ppid:615555 flags:0x00000004
Jul 04 00:44:50 k3s06 kernel: Call Trace:
Jul 04 00:44:50 k3s06 kernel:  <TASK>
Jul 04 00:44:50 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:44:50 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:44:50 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:44:50 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:44:50 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:44:50 k3s06 kernel:  ? __filemap_fdatawait_range+0x3a/0x120
Jul 04 00:44:50 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:44:50 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:44:50 k3s06 kernel:  vfs_fsync_range+0x46/0x90
Jul 04 00:44:50 k3s06 kernel:  ? __fget_light+0x39/0x90
Jul 04 00:44:50 k3s06 kernel:  __x64_sys_fsync+0x38/0x70
Jul 04 00:44:50 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:44:50 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:44:50 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:44:50 k3s06 kernel: RIP: 0033:0x7f8a92ba2a97
Jul 04 00:44:50 k3s06 kernel: RSP: 002b:00007f8a82b19890 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 04 00:44:50 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007f8a92ba2a97
Jul 04 00:44:50 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000013
Jul 04 00:44:50 k3s06 kernel: RBP: 00007f8a82b198e0 R08: 0000000000000000 R09: 0000000000000001
Jul 04 00:44:50 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8a82b19960
Jul 04 00:44:50 k3s06 kernel: R13: 0000000000000002 R14: 00007f8a6413cb50 R15: 00007f8a64016238
Jul 04 00:44:50 k3s06 kernel:  </TASK>
Jul 04 00:44:50 k3s06 kernel: INFO: task WTCheck.tThread:616605 blocked for more than 120 seconds.
Jul 04 00:44:50 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:44:50 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:44:50 k3s06 kernel: task:WTCheck.tThread state:D stack:    0 pid:616605 ppid:616510 flags:0x00000000
Jul 04 00:44:50 k3s06 kernel: Call Trace:
Jul 04 00:44:50 k3s06 kernel:  <TASK>
Jul 04 00:44:50 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:44:50 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:44:50 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:44:50 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:44:50 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:44:50 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:44:50 k3s06 kernel:  ? wbc_detach_inode+0x142/0x210
Jul 04 00:44:50 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:44:50 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:44:50 k3s06 kernel:  __x64_sys_fdatasync+0x48/0x90
Jul 04 00:44:50 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:44:50 k3s06 kernel:  ? sysvec_apic_timer_interrupt+0x4e/0x90
Jul 04 00:44:50 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:44:50 k3s06 kernel: RIP: 0033:0x7fcd4954ffc7
Jul 04 00:44:50 k3s06 kernel: RSP: 002b:00007fcd4357a7b0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jul 04 00:44:50 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000093 RCX: 00007fcd4954ffc7
Jul 04 00:44:50 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000093 RDI: 0000000000000093
Jul 04 00:44:50 k3s06 kernel: RBP: 00007fcd4357a810 R08: 0000000000000020 R09: 0000000000000000
Jul 04 00:44:50 k3s06 kernel: R10: 0000000000000017 R11: 0000000000000293 R12: 000000000000000a
Jul 04 00:44:50 k3s06 kernel: R13: 0000000000000001 R14: 00005630ccb039c0 R15: 00005630c5c46269
Jul 04 00:44:50 k3s06 kernel:  </TASK>
Jul 04 00:46:51 k3s06 kernel: INFO: task python3:584178 blocked for more than 241 seconds.
Jul 04 00:46:51 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:46:51 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:46:51 k3s06 kernel: task:python3         state:D stack:    0 pid:584178 ppid:579848 flags:0x00004000
Jul 04 00:46:51 k3s06 kernel: Call Trace:
Jul 04 00:46:51 k3s06 kernel:  <TASK>
Jul 04 00:46:51 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:46:51 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:46:51 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:46:51 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:46:51 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:46:51 k3s06 kernel:  ? dentry_lru_isolate+0xcf/0x140
Jul 04 00:46:51 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:46:51 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:46:51 k3s06 kernel:  __x64_sys_fdatasync+0x48/0x90
Jul 04 00:46:51 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:46:51 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:46:51 k3s06 kernel: RIP: 0033:0x7f03aa537fac
Jul 04 00:46:51 k3s06 kernel: RSP: 002b:00007f039defe4a8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 04 00:46:51 k3s06 kernel: RAX: ffffffffffffffda RBX: 000000000000004b RCX: 00007f03aa537fac
Jul 04 00:46:51 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
Jul 04 00:46:51 k3s06 kernel: RBP: 00007f039deffb38 R08: 0000000000000000 R09: 0000000000000000
Jul 04 00:46:51 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f02ec89a008
Jul 04 00:46:51 k3s06 kernel: R13: 0000000000000000 R14: 0000000000000408 R15: 0000000000000000
Jul 04 00:46:51 k3s06 kernel:  </TASK>
Jul 04 00:46:51 k3s06 kernel: INFO: task node:615723 blocked for more than 241 seconds.
Jul 04 00:46:51 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:46:51 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:46:51 k3s06 kernel: task:node            state:D stack:    0 pid:615723 ppid:615555 flags:0x00000004
Jul 04 00:46:51 k3s06 kernel: Call Trace:
Jul 04 00:46:51 k3s06 kernel:  <TASK>
Jul 04 00:46:51 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:46:51 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:46:51 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:46:51 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:46:51 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:46:51 k3s06 kernel:  ? __filemap_fdatawait_range+0x3a/0x120
Jul 04 00:46:51 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:46:51 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:46:51 k3s06 kernel:  vfs_fsync_range+0x46/0x90
Jul 04 00:46:51 k3s06 kernel:  ? __fget_light+0x39/0x90
Jul 04 00:46:51 k3s06 kernel:  __x64_sys_fsync+0x38/0x70
Jul 04 00:46:51 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:46:51 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:46:51 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:46:51 k3s06 kernel: RIP: 0033:0x7f8a92ba2a97
Jul 04 00:46:51 k3s06 kernel: RSP: 002b:00007f8a82b19890 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 04 00:46:51 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007f8a92ba2a97
Jul 04 00:46:51 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000013
Jul 04 00:46:51 k3s06 kernel: RBP: 00007f8a82b198e0 R08: 0000000000000000 R09: 0000000000000001
Jul 04 00:46:51 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8a82b19960
Jul 04 00:46:51 k3s06 kernel: R13: 0000000000000002 R14: 00007f8a6413cb50 R15: 00007f8a64016238
Jul 04 00:46:51 k3s06 kernel:  </TASK>
Jul 04 00:46:51 k3s06 kernel: INFO: task WTCheck.tThread:616605 blocked for more than 241 seconds.
Jul 04 00:46:51 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:46:51 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:46:51 k3s06 kernel: task:WTCheck.tThread state:D stack:    0 pid:616605 ppid:616510 flags:0x00000000
Jul 04 00:46:51 k3s06 kernel: Call Trace:
Jul 04 00:46:51 k3s06 kernel:  <TASK>
Jul 04 00:46:51 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:46:51 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:46:51 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:46:51 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:46:51 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:46:51 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:46:51 k3s06 kernel:  ? wbc_detach_inode+0x142/0x210
Jul 04 00:46:51 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:46:51 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:46:51 k3s06 kernel:  __x64_sys_fdatasync+0x48/0x90
Jul 04 00:46:51 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:46:51 k3s06 kernel:  ? sysvec_apic_timer_interrupt+0x4e/0x90
Jul 04 00:46:51 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:46:51 k3s06 kernel: RIP: 0033:0x7fcd4954ffc7
Jul 04 00:46:51 k3s06 kernel: RSP: 002b:00007fcd4357a7b0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jul 04 00:46:51 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000093 RCX: 00007fcd4954ffc7
Jul 04 00:46:51 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000093 RDI: 0000000000000093
Jul 04 00:46:51 k3s06 kernel: RBP: 00007fcd4357a810 R08: 0000000000000020 R09: 0000000000000000
Jul 04 00:46:51 k3s06 kernel: R10: 0000000000000017 R11: 0000000000000293 R12: 000000000000000a
Jul 04 00:46:51 k3s06 kernel: R13: 0000000000000001 R14: 00005630ccb039c0 R15: 00005630c5c46269
Jul 04 00:46:51 k3s06 kernel:  </TASK>
Jul 04 00:48:52 k3s06 kernel: INFO: task jbd2/rbd7-8:571525 blocked for more than 120 seconds.
Jul 04 00:48:52 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:48:52 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:48:52 k3s06 kernel: task:jbd2/rbd7-8     state:D stack:    0 pid:571525 ppid:     2 flags:0x00004000
Jul 04 00:48:52 k3s06 kernel: Call Trace:
Jul 04 00:48:52 k3s06 kernel:  <TASK>
Jul 04 00:48:52 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:48:52 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:48:52 k3s06 kernel:  jbd2_journal_commit_transaction+0x307/0x17a0
Jul 04 00:48:52 k3s06 kernel:  ? __update_idle_core+0x93/0x120
Jul 04 00:48:52 k3s06 kernel:  ? psi_task_switch+0xc6/0x220
Jul 04 00:48:52 k3s06 kernel:  ? wait_woken+0x70/0x70
Jul 04 00:48:52 k3s06 kernel:  kjournald2+0xa9/0x280
Jul 04 00:48:52 k3s06 kernel:  ? wait_woken+0x70/0x70
Jul 04 00:48:52 k3s06 kernel:  ? load_superblock.part.0+0xc0/0xc0
Jul 04 00:48:52 k3s06 kernel:  kthread+0x127/0x150
Jul 04 00:48:52 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 04 00:48:52 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 04 00:48:52 k3s06 kernel:  </TASK>
Jul 04 00:48:52 k3s06 kernel: INFO: task python3:584178 blocked for more than 362 seconds.
Jul 04 00:48:52 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:48:52 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:48:52 k3s06 kernel: task:python3         state:D stack:    0 pid:584178 ppid:579848 flags:0x00004000
Jul 04 00:48:52 k3s06 kernel: Call Trace:
Jul 04 00:48:52 k3s06 kernel:  <TASK>
Jul 04 00:48:52 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:48:52 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:48:52 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:48:52 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:48:52 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:48:52 k3s06 kernel:  ? dentry_lru_isolate+0xcf/0x140
Jul 04 00:48:52 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:48:52 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:48:52 k3s06 kernel:  __x64_sys_fdatasync+0x48/0x90
Jul 04 00:48:52 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:48:52 k3s06 kernel: RIP: 0033:0x7f03aa537fac
Jul 04 00:48:52 k3s06 kernel: RSP: 002b:00007f039defe4a8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
Jul 04 00:48:52 k3s06 kernel: RAX: ffffffffffffffda RBX: 000000000000004b RCX: 00007f03aa537fac
Jul 04 00:48:52 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
Jul 04 00:48:52 k3s06 kernel: RBP: 00007f039deffb38 R08: 0000000000000000 R09: 0000000000000000
Jul 04 00:48:52 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f02ec89a008
Jul 04 00:48:52 k3s06 kernel: R13: 0000000000000000 R14: 0000000000000408 R15: 0000000000000000
Jul 04 00:48:52 k3s06 kernel:  </TASK>
Jul 04 00:48:52 k3s06 kernel: INFO: task node:615723 blocked for more than 362 seconds.
Jul 04 00:48:52 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:48:52 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:48:52 k3s06 kernel: task:node            state:D stack:    0 pid:615723 ppid:615555 flags:0x00000004
Jul 04 00:48:52 k3s06 kernel: Call Trace:
Jul 04 00:48:52 k3s06 kernel:  <TASK>
Jul 04 00:48:52 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:48:52 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:48:52 k3s06 kernel:  io_schedule+0x46/0x80
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_bit_common+0x10c/0x3d0
Jul 04 00:48:52 k3s06 kernel:  ? filemap_invalidate_unlock_two+0x50/0x50
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_bit+0x3f/0x50
Jul 04 00:48:52 k3s06 kernel:  wait_on_page_writeback+0x26/0x80
Jul 04 00:48:52 k3s06 kernel:  __filemap_fdatawait_range+0x97/0x120
Jul 04 00:48:52 k3s06 kernel:  ? __filemap_fdatawait_range+0x3a/0x120
Jul 04 00:48:52 k3s06 kernel:  file_write_and_wait_range+0xd3/0xf0
Jul 04 00:48:52 k3s06 kernel:  ext4_sync_file+0x7b/0x330
Jul 04 00:48:52 k3s06 kernel:  vfs_fsync_range+0x46/0x90
Jul 04 00:48:52 k3s06 kernel:  ? __fget_light+0x39/0x90
Jul 04 00:48:52 k3s06 kernel:  __x64_sys_fsync+0x38/0x70
Jul 04 00:48:52 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? exit_to_user_mode_prepare+0x37/0xb0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_lseek+0x18/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? __x64_sys_write+0x19/0x20
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:48:52 k3s06 kernel: RIP: 0033:0x7f8a92ba2a97
Jul 04 00:48:52 k3s06 kernel: RSP: 002b:00007f8a82b19890 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jul 04 00:48:52 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007f8a92ba2a97
Jul 04 00:48:52 k3s06 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000013
Jul 04 00:48:52 k3s06 kernel: RBP: 00007f8a82b198e0 R08: 0000000000000000 R09: 0000000000000001
Jul 04 00:48:52 k3s06 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8a82b19960
Jul 04 00:48:52 k3s06 kernel: R13: 0000000000000002 R14: 00007f8a6413cb50 R15: 00007f8a64016238
Jul 04 00:48:52 k3s06 kernel:  </TASK>
Jul 04 00:48:52 k3s06 kernel: INFO: task WTJourn.Flusher:616604 blocked for more than 120 seconds.
Jul 04 00:48:52 k3s06 kernel:       Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 00:48:52 k3s06 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 04 00:48:52 k3s06 kernel: task:WTJourn.Flusher state:D stack:    0 pid:616604 ppid:616510 flags:0x00000000
Jul 04 00:48:52 k3s06 kernel: Call Trace:
Jul 04 00:48:52 k3s06 kernel:  <TASK>
Jul 04 00:48:52 k3s06 kernel:  __schedule+0x24e/0x590
Jul 04 00:48:52 k3s06 kernel:  ? update_sd_lb_stats.constprop.0+0xf2/0x390
Jul 04 00:48:52 k3s06 kernel:  schedule+0x69/0x110
Jul 04 00:48:52 k3s06 kernel:  wait_transaction_locked+0x8e/0xd0
Jul 04 00:48:52 k3s06 kernel:  ? wait_woken+0x70/0x70
Jul 04 00:48:52 k3s06 kernel:  add_transaction_credits+0xdb/0x2d0
Jul 04 00:48:52 k3s06 kernel:  start_this_handle+0xfd/0x510
Jul 04 00:48:52 k3s06 kernel:  ? kmem_cache_alloc+0x1ab/0x2f0
Jul 04 00:48:52 k3s06 kernel:  ? jbd2__journal_start+0x95/0x1f0
Jul 04 00:48:52 k3s06 kernel:  jbd2__journal_start+0x103/0x1f0
Jul 04 00:48:52 k3s06 kernel:  __ext4_journal_start_sb+0x10b/0x130
Jul 04 00:48:52 k3s06 kernel:  ext4_dirty_inode+0x3a/0x80
Jul 04 00:48:52 k3s06 kernel:  __mark_inode_dirty+0x5b/0x330
Jul 04 00:48:52 k3s06 kernel:  generic_update_time+0x6c/0xd0
Jul 04 00:48:52 k3s06 kernel:  file_update_time+0x127/0x140
Jul 04 00:48:52 k3s06 kernel:  file_modified+0x27/0x40
Jul 04 00:48:52 k3s06 kernel:  ext4_buffered_write_iter+0x5f/0x180
Jul 04 00:48:52 k3s06 kernel:  ext4_file_write_iter+0x43/0x60
Jul 04 00:48:52 k3s06 kernel:  new_sync_write+0x111/0x1a0
Jul 04 00:48:52 k3s06 kernel:  vfs_write+0x1d5/0x270
Jul 04 00:48:52 k3s06 kernel:  __x64_sys_pwrite64+0x96/0xc0
Jul 04 00:48:52 k3s06 kernel:  do_syscall_64+0x59/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? restore_fpregs_from_fpstate+0x4e/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? switch_fpu_return+0x4e/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? exit_to_user_mode_prepare+0x96/0xb0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  ? exit_to_user_mode_prepare+0x96/0xb0
Jul 04 00:48:52 k3s06 kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 00:48:52 k3s06 kernel:  ? do_syscall_64+0x69/0xc0
Jul 04 00:48:52 k3s06 kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Jul 04 00:48:52 k3s06 kernel: RIP: 0033:0x7fcd4983c13f
Jul 04 00:48:52 k3s06 kernel: RSP: 002b:00007fcd43d7b570 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Jul 04 00:48:52 k3s06 kernel: RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fcd4983c13f
Jul 04 00:48:52 k3s06 kernel: RDX: 0000000000000500 RSI: 00005630cbdec000 RDI: 0000000000000010
Jul 04 00:48:52 k3s06 kernel: RBP: 00005630cbdec000 R08: 0000000000000000 R09: 0000000000000500
Jul 04 00:48:52 k3s06 kernel: R10: 0000000000b6d300 R11: 0000000000000293 R12: 0000000000000500
Jul 04 00:48:52 k3s06 kernel: R13: 0000000000b6d300 R14: 00005630c8968b40 R15: 0000000000b6d300
Jul 04 00:48:52 k3s06 kernel:  </TASK>

The ZFS panic does not show up until a bit later:

Jul 04 01:20:35 k3s06 kernel: PANIC: zfs: adding existent segment to range tree (offset=2941f81f000 size=1000)
Jul 04 01:20:35 k3s06 kernel: Showing stack for process 1111
Jul 04 01:20:35 k3s06 kernel: CPU: 18 PID: 1111 Comm: txg_sync Tainted: P           O      5.15.0-76-generic #83-Ubuntu
Jul 04 01:20:35 k3s06 kernel: Hardware name: ASRock B660M Steel Legend/B660M Steel Legend, BIOS 10.02 02/10/2023
Jul 04 01:20:35 k3s06 kernel: Call Trace:
Jul 04 01:20:35 k3s06 kernel:  <TASK>
Jul 04 01:20:35 k3s06 kernel:  show_stack+0x52/0x5c
Jul 04 01:20:35 k3s06 kernel:  dump_stack_lvl+0x4a/0x63
Jul 04 01:20:35 k3s06 kernel:  dump_stack+0x10/0x16
Jul 04 01:20:35 k3s06 kernel:  spl_dumpstack+0x29/0x2f [spl]
Jul 04 01:20:35 k3s06 kernel:  vcmn_err.cold+0x60/0x78 [spl]
Jul 04 01:20:35 k3s06 kernel:  ? bcpy+0x17/0x20 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? zfs_btree_insert_leaf_impl+0x3f/0x50 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? zfs_btree_insert_into_leaf+0x21f/0x2b0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? percpu_counter_add+0xf/0x20 [spl]
Jul 04 01:20:35 k3s06 kernel:  ? zfs_btree_find_in_buf+0x5a/0xa0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 04 01:20:35 k3s06 kernel:  range_tree_add_impl+0x183/0x610 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul 04 01:20:35 k3s06 kernel:  range_tree_add+0x11/0x20 [zfs]
Jul 04 01:20:35 k3s06 kernel:  metaslab_free_concrete+0x146/0x270 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul 04 01:20:35 k3s06 kernel:  metaslab_free_impl+0xb3/0xf0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  metaslab_free_dva+0x61/0x80 [zfs]
Jul 04 01:20:35 k3s06 kernel:  metaslab_free+0x114/0x1d0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  zio_free_sync+0xf1/0x110 [zfs]
Jul 04 01:20:35 k3s06 kernel:  dsl_scan_free_block_cb+0x6e/0x1d0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  bpobj_dsl_scan_free_block_cb+0x11/0x20 [zfs]
Jul 04 01:20:35 k3s06 kernel:  bpobj_iterate_blkptrs+0xf6/0x380 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? list_head+0xd/0x30 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  bpobj_iterate_impl+0x23b/0x390 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? dsl_scan_free_block_cb+0x1d0/0x1d0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  bpobj_iterate+0x17/0x20 [zfs]
Jul 04 01:20:35 k3s06 kernel:  dsl_process_async_destroys+0x2d5/0x580 [zfs]
Jul 04 01:20:35 k3s06 kernel:  dsl_scan_sync+0x1ec/0x910 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? ddt_sync+0xa8/0xd0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  spa_sync_iterate_to_convergence+0x124/0x1f0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  spa_sync+0x2dc/0x5b0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  txg_sync_thread+0x266/0x2f0 [zfs]
Jul 04 01:20:35 k3s06 kernel:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
Jul 04 01:20:35 k3s06 kernel:  thread_generic_wrapper+0x61/0x80 [spl]
Jul 04 01:20:35 k3s06 kernel:  ? __thread_exit+0x20/0x20 [spl]
Jul 04 01:20:35 k3s06 kernel:  kthread+0x127/0x150
Jul 04 01:20:35 k3s06 kernel:  ? set_kthread_struct+0x50/0x50
Jul 04 01:20:35 k3s06 kernel:  ret_from_fork+0x1f/0x30
Jul 04 01:20:35 k3s06 kernel:  </TASK>

@AdamLantos
Copy link

AdamLantos commented Jul 17, 2023

I was having similar symptoms on a TrueNAS Scale system. My backup pool of 2 disks got corrupted somehow. Scrub is clean, but any time I'm pushing a particular dataset to it, it panics. I was able to reproduce the panic by manually deleting one snapshot from the dataset. Export and reimport works, but deleting snapshots immediately panics the system and needs hard reset since normal shutdown never completes after zfs panic.

Since the backup pool is somewhat important for me, I enabled zfs_recover parameter and deleted the offending dataset and replicated from scratch. For now the system appears to behave, but I'm not sure if I have confidence in the pool any more. I will keep the system running with zfs_recover enabled for a few days, and will wipe/recreate the pool when this reoccurs.

@reefland
Copy link
Author

I left it running for about a week with zil_replay_disable and zfs_recover set. All that time, did not have a panic or warning message. Rebooted and completed a full scrub without a panic message.

I've had no more zfs_panic_recover or WARNING: zfs: adding existent segment to range tree since then... close to 2 weeks now.

I assume this is from zfs constantly rewriting spacemaps in condensing operation. Running the pool in recovery mode for a while eventually removed the duplicated range descriptions during the rewrite operations.... or something like that.

@satmandu
Copy link
Contributor

satmandu commented Aug 14, 2023

I'm also seeing this on a pool using kernel 6.5-rc6. Here is my zdb output: https://dpaste.com/DL9KS6SQ5

I've set boot variables for vfs.zfs.zil.replay_disable=1 and vfs.zfs.zfs_recover=1 for now, so hopefully that helps?

@satmandu
Copy link
Contributor

@reefland Is there a way to force spacemap condensing to happen faster to try to resolve this?

@reefland
Copy link
Author

@satmandu - sorry, no idea. No clue how to monitor the spacemap condensing operation - trial and error. Hopefully someone more knowledgeable can answer.

  • Six weeks now without a PANIC
  • Completed an 11 hour scrub without issue scan: scrub repaired 0B in 11:13:59 with 0 errors on Sun Aug 13 11:38:00 2023

@satmandu
Copy link
Contributor

Thanks. My pool has clean scrubs, but my system still panics if I mount the pool without the special zfs options... but I also haven't had the pool mounted with the special options for more than 16 hours...

@rincebrain
Copy link
Contributor

rincebrain commented Aug 23, 2023

If rewriting the spacemaps removes the assertion, that might suggest you still have lurking excitement eventually, since in theory that should be equally upset by duplicate range insertion, no?

My wild guess would be when it threw out the log after you imported the pool and unlocked the datasets with replaying disabled, it no longer had a pending insertion that was invalid. But that's just a wild guess.

@satmandu
Copy link
Contributor

I hit this again... and had to set zfs.zil.replay_disable=1 and zfs.zfs_recover=1 as kernel cmdline arguments to get boot to succeed.

@reefland
Copy link
Author

I've had this multiple times on different systems (it has not duplicated on the same system). It's just a process I follow now to recover the system. At least its a workaround. I'm no closer to finding a cause / pattern.

@satmandu
Copy link
Contributor

I destroyed a zfs subvolume (that may have had this issue because that's where intense writes from docker were happening) and still have the issue when importing the parent volume.

I will copy all the data to a fresh volume and recreate the volume.

If I delete all the data and the volume still has these issues, could someone look at an image of the disk to help figure out how to repair this issue, even if figuring out how this damage is being created isn't proximally resolved?

Would that be useful at all?

@reefland
Copy link
Author

@satmandu - I can't comment on your disk image question.

But I've had to destroy a number of ZVOLs (extract data, delete ZVOL, recreate, restore, etc) related to this as well. I've since switched all ZVOLs to forced sync writes. Yeah, performance impact, but I haven't had any ZVOLs get corrupted since.

I was frequently getting this after a ZFS panic:

errors: Permanent errors have been detected in the following files:
        rpool/rancher:<0x1> 

Multiple scrubs were not able to fix it. I'm using CRI / K3s instead of Docker, but we're probably using ZVOLs for similar reasons.

@satmandu
Copy link
Contributor

I wonder if we can create a test case of amplifying read/writes to replicate the issue.

Or maybe if there is an existing test case we should be trying on our hardware?

This sounds like it should be replicable in a test environment where one is reaching the I/O limits of a disk subsystem or a disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants