Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All I/O hangs in ZFS #8469

Closed
Anteru opened this issue Mar 1, 2019 · 11 comments
Closed

All I/O hangs in ZFS #8469

Anteru opened this issue Mar 1, 2019 · 11 comments
Labels
Status: Stale No recent activity for issue

Comments

@Anteru
Copy link

Anteru commented Mar 1, 2019

System information

Distribution Name | Ubuntu
Distribution Version | 18.04.2 LTS
Linux Kernel | 4.15.0
Architecture | x64
ZFS Version | 0.7.5-1ubuntu16.4
SPL Version | 0.7.5-1ubuntu1

Describe the problem you're observing

I/O freezes after running the server for quite some time. The server is running a bunch of virtual machines, and becomes unresponsive once this condition appears.

Describe how to reproduce the problem

No good repro case, it just happens after 90..120 days.

FWIW, the machine has two pools configured like this

 state: ONLINE
  scan: scrub repaired 0B in 2h17m with 0 errors on Sun Feb 10 02:41:53 2019
config:

        NAME                                  STATE     READ WRITE CKSUM
        ssd                                   ONLINE       0     0     0
          raidz1-0                            ONLINE       0     0     0
            ata-CT1000MX500SSD1_1816E13707A1  ONLINE       0     0     0
            ata-CT1000MX500SSD1_1816E13707AE  ONLINE       0     0     0
            ata-CT1000MX500SSD1_1816E13707DC  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 2h5m with 0 errors on Sun Feb 10 02:29:39 2019
config:

        NAME                                   STATE     READ WRITE CKSUM
        tank                                   ONLINE       0     0     0
          raidz2-0                             ONLINE       0     0     0
            ata-ST10000VN0004-2GS11L_ZCH0DZ0D  ONLINE       0     0     0
            ata-ST10000VN0004-2GS11L_ZJV04M5M  ONLINE       0     0     0
            ata-ST10000VN0004-2GS11L_ZJV0884P  ONLINE       0     0     0
            ata-ST10000VN0004-2GS11L_ZJV0B0XK  ONLINE       0     0     0
        cache
          ata-CT1000MX500SSD1_1816E13706F6     ONLINE       0     0     0

errors: No known data errors

Include any warning/errors/backtraces from the system logs

dmesg reports:

[10987422.198846]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.198895] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.198953] zvol            D    0  2526      2 0x80000000
[10987422.198960] Call Trace:
[10987422.198979]  __schedule+0x291/0x8a0
[10987422.198983]  schedule+0x2c/0x80
[10987422.198987]  rwsem_down_read_failed+0xee/0x150
[10987422.199004]  ? taskq_init_ent+0x3c/0x80 [spl]
[10987422.199010]  call_rwsem_down_read_failed+0x18/0x30
[10987422.199014]  ? call_rwsem_down_read_failed+0x18/0x30
[10987422.199017]  down_read+0x20/0x40
[10987422.199080]  dmu_tx_check_ioerr+0x24/0xc0 [zfs]
[10987422.199130]  dmu_tx_count_write+0x16c/0x1d0 [zfs]
[10987422.199180]  dmu_tx_hold_write+0x41/0x60 [zfs]
[10987422.199243]  zvol_write+0x13c/0x620 [zfs]
[10987422.199252]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.199259]  ? wake_up_q+0x80/0x80
[10987422.199268]  kthread+0x121/0x140
[10987422.199275]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.199279]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.199283]  ret_from_fork+0x22/0x40
[10987422.199298] INFO: task z_wr_iss:3358 blocked for more than 120 seconds.
[10987422.199348]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.199397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.199453] z_wr_iss        D    0  3358      2 0x80000000
[10987422.199456] Call Trace:
[10987422.199461]  __schedule+0x291/0x8a0
[10987422.199464]  schedule+0x2c/0x80
[10987422.199468]  rwsem_down_write_failed+0x169/0x360
[10987422.199472]  call_rwsem_down_write_failed+0x17/0x30
[10987422.199476]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.199482]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.199485]  down_write+0x2d/0x40
[10987422.199527]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.199567]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.199570]  ? mutex_lock+0x12/0x40
[10987422.199633]  zio_ready+0x68/0x430 [zfs]
[10987422.199640]  ? taskq_member+0x18/0x30 [spl]
[10987422.199703]  zio_execute+0x90/0xf0 [zfs]
[10987422.199711]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.199715]  ? wake_up_q+0x80/0x80
[10987422.199778]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.199782]  kthread+0x121/0x140
[10987422.199790]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.199794]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.199797]  ret_from_fork+0x22/0x40
[10987422.199807] INFO: task z_wr_iss:3369 blocked for more than 120 seconds.
[10987422.199856]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.199905] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.199962] z_wr_iss        D    0  3369      2 0x80000000
[10987422.199965] Call Trace:
[10987422.199969]  __schedule+0x291/0x8a0
[10987422.199972]  schedule+0x2c/0x80
[10987422.199975]  rwsem_down_write_failed+0x169/0x360
[10987422.199979]  call_rwsem_down_write_failed+0x17/0x30
[10987422.199982]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.199989]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.199992]  down_write+0x2d/0x40
[10987422.200034]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.200074]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.200077]  ? mutex_lock+0x12/0x40
[10987422.200139]  zio_ready+0x68/0x430 [zfs]
[10987422.200147]  ? taskq_member+0x18/0x30 [spl]
[10987422.200209]  zio_execute+0x90/0xf0 [zfs]
[10987422.200217]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.200220]  ? wake_up_q+0x80/0x80
[10987422.200283]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.200287]  kthread+0x121/0x140
[10987422.200295]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.200298]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.200302]  ret_from_fork+0x22/0x40
[10987422.200307] INFO: task z_wr_iss:3372 blocked for more than 120 seconds.
[10987422.200357]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.200405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.200461] z_wr_iss        D    0  3372      2 0x80000000
[10987422.200464] Call Trace:
[10987422.200468]  __schedule+0x291/0x8a0
[10987422.200471]  schedule+0x2c/0x80
[10987422.200474]  rwsem_down_write_failed+0x169/0x360
[10987422.200478]  call_rwsem_down_write_failed+0x17/0x30
[10987422.200482]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.200488]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.200491]  down_write+0x2d/0x40
[10987422.200533]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.200573]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.200576]  ? mutex_lock+0x12/0x40
[10987422.200637]  zio_ready+0x68/0x430 [zfs]
[10987422.200645]  ? taskq_member+0x18/0x30 [spl]
[10987422.200707]  zio_execute+0x90/0xf0 [zfs]
[10987422.200715]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.200718]  ? wake_up_q+0x80/0x80
[10987422.200781]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.200785]  kthread+0x121/0x140
[10987422.200793]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.200796]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.200800]  ret_from_fork+0x22/0x40
[10987422.200810] INFO: task z_wr_iss:3385 blocked for more than 120 seconds.
[10987422.200860]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.200908] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.202632] z_wr_iss        D    0  3385      2 0x80000000
[10987422.202635] Call Trace:
[10987422.202640]  __schedule+0x291/0x8a0
[10987422.202644]  schedule+0x2c/0x80
[10987422.202655]  rwsem_down_write_failed+0x169/0x360
[10987422.202659]  call_rwsem_down_write_failed+0x17/0x30
[10987422.202662]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.202669]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.202673]  down_write+0x2d/0x40
[10987422.202715]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.202767]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.202776]  ? mutex_lock+0x12/0x40
[10987422.202847]  zio_ready+0x68/0x430 [zfs]
[10987422.202870]  ? taskq_member+0x18/0x30 [spl]
[10987422.202936]  zio_execute+0x90/0xf0 [zfs]
[10987422.202953]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.202961]  ? wake_up_q+0x80/0x80
[10987422.203033]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.203048]  kthread+0x121/0x140
[10987422.203067]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.203081]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.203091]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.203100]  ret_from_fork+0x22/0x40
[10987422.203114] INFO: task z_wr_iss:3386 blocked for more than 120 seconds.
[10987422.204746]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.206376] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.207992] z_wr_iss        D    0  3386      2 0x80000000
[10987422.207995] Call Trace:
[10987422.208000]  __schedule+0x291/0x8a0
[10987422.208007]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.208010]  schedule+0x2c/0x80
[10987422.208013]  rwsem_down_write_failed+0x169/0x360
[10987422.208019]  call_rwsem_down_write_failed+0x17/0x30
[10987422.208028]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.208045]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.208058]  down_write+0x2d/0x40
[10987422.208111]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.208162]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.208171]  ? mutex_lock+0x12/0x40
[10987422.208234]  zio_ready+0x68/0x430 [zfs]
[10987422.208253]  ? taskq_member+0x18/0x30 [spl]
[10987422.208316]  zio_execute+0x90/0xf0 [zfs]
[10987422.208324]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.208328]  ? wake_up_q+0x80/0x80
[10987422.208391]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.208396]  kthread+0x121/0x140
[10987422.208404]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.208408]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.208412]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.208415]  ret_from_fork+0x22/0x40
[10987422.208420] INFO: task z_wr_iss:3387 blocked for more than 120 seconds.
[10987422.210010]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.211617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.213228] z_wr_iss        D    0  3387      2 0x80000000
[10987422.213232] Call Trace:
[10987422.213237]  __schedule+0x291/0x8a0
[10987422.213240]  schedule+0x2c/0x80
[10987422.213243]  rwsem_down_write_failed+0x169/0x360
[10987422.213248]  call_rwsem_down_write_failed+0x17/0x30
[10987422.213252]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.213258]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.213262]  down_write+0x2d/0x40
[10987422.213304]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.213345]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.213351]  ? mutex_lock+0x12/0x40
[10987422.213414]  zio_ready+0x68/0x430 [zfs]
[10987422.213421]  ? taskq_member+0x18/0x30 [spl]
[10987422.213484]  zio_execute+0x90/0xf0 [zfs]
[10987422.213493]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.213499]  ? wake_up_q+0x80/0x80
[10987422.213562]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.213567]  kthread+0x121/0x140
[10987422.213575]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.213579]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.213583]  ret_from_fork+0x22/0x40
[10987422.213608] INFO: task z_wr_iss:3437 blocked for more than 120 seconds.
[10987422.215237]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.216860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.218512] z_wr_iss        D    0  3437      2 0x80000000
[10987422.218516] Call Trace:
[10987422.218521]  __schedule+0x291/0x8a0
[10987422.218524]  schedule+0x2c/0x80
[10987422.218527]  rwsem_down_write_failed+0x169/0x360
[10987422.218532]  call_rwsem_down_write_failed+0x17/0x30
[10987422.218536]  ? call_rwsem_down_write_failed+0x17/0x30
[10987422.218543]  ? spl_kmem_free+0x33/0x40 [spl]
[10987422.218547]  down_write+0x2d/0x40
[10987422.218602]  dbuf_write_ready+0x1de/0x2e0 [zfs]
[10987422.218657]  arc_write_ready+0x9e/0x320 [zfs]
[10987422.218679]  ? mutex_lock+0x12/0x40
[10987422.218751]  zio_ready+0x68/0x430 [zfs]
[10987422.218773]  ? taskq_member+0x18/0x30 [spl]
[10987422.218836]  zio_execute+0x90/0xf0 [zfs]
[10987422.218845]  taskq_thread+0x2ab/0x4e0 [spl]
[10987422.218850]  ? wake_up_q+0x80/0x80
[10987422.218913]  ? zio_reexecute+0x390/0x390 [zfs]
[10987422.218932]  kthread+0x121/0x140
[10987422.218952]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[10987422.218964]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.218970]  ret_from_fork+0x22/0x40
[10987422.219117] INFO: task txg_sync:5630 blocked for more than 120 seconds.
[10987422.220793]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.222574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.224332] txg_sync        D    0  5630      2 0x80000000
[10987422.224335] Call Trace:
[10987422.224340]  __schedule+0x291/0x8a0
[10987422.224408]  ? zio_taskq_dispatch+0x73/0x90 [zfs]
[10987422.224411]  schedule+0x2c/0x80
[10987422.224415]  io_schedule+0x16/0x40
[10987422.224425]  cv_wait_common+0xb2/0x140 [spl]
[10987422.224431]  ? wait_woken+0x80/0x80
[10987422.224441]  __cv_wait_io+0x18/0x20 [spl]
[10987422.224506]  zio_wait+0xf8/0x1b0 [zfs]
[10987422.224565]  dsl_pool_sync+0xb8/0x430 [zfs]
[10987422.224633]  spa_sync+0x43e/0xd80 [zfs]
[10987422.224703]  txg_sync_thread+0x2cd/0x4a0 [zfs]
[10987422.224707]  ? __switch_to_asm+0x40/0x70
[10987422.224775]  ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[10987422.224783]  thread_generic_wrapper+0x74/0x90 [spl]
[10987422.224787]  kthread+0x121/0x140
[10987422.224795]  ? __thread_exit+0x20/0x20 [spl]
[10987422.224799]  ? kthread_create_worker_on_cpu+0x70/0x70
[10987422.224802]  ret_from_fork+0x22/0x40
[10987422.224992] INFO: task worker:126277 blocked for more than 120 seconds.
[10987422.226773]       Tainted: P           O     4.15.0-38-generic #41-Ubuntu
[10987422.228546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10987422.230330] worker          D    0 126277      1 0x00000000
[10987422.230334] Call Trace:
[10987422.230339]  __schedule+0x291/0x8a0
[10987422.230343]  schedule+0x2c/0x80
[10987422.230346]  io_schedule+0x16/0x40
[10987422.230353]  wait_on_page_bit_common+0xd8/0x160
[10987422.230357]  ? page_cache_tree_insert+0xe0/0xe0
[10987422.230362]  __filemap_fdatawait_range+0xfa/0x160
[10987422.230368]  file_write_and_wait_range+0x86/0xb0
[10987422.230374]  blkdev_fsync+0x1b/0x50
[10987422.230379]  vfs_fsync_range+0x51/0xb0
[10987422.230383]  do_fsync+0x3d/0x70
[10987422.230386]  SyS_fdatasync+0x13/0x20
[10987422.230392]  do_syscall_64+0x73/0x130
[10987422.230396]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[10987422.230401] RIP: 0033:0x7f20e85c12e7
[10987422.230402] RSP: 002b:00007f1e5f7fd8c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[10987422.230406] RAX: ffffffffffffffda RBX: 0000000000000012 RCX: 00007f20e85c12e7
[10987422.230408] RDX: 0000000000000000 RSI: 0000561b53d00d40 RDI: 0000000000000012
[10987422.230410] RBP: 0000561b57301750 R08: 0000000000000000 R09: 00000000ffffffff
[10987422.230413] R10: 00007f1e5f7fd900 R11: 0000000000000293 R12: 0000561b56de4eb0
[10987422.230414] R13: 0000561b567bf6e8 R14: 0000561b567bf670 R15: 00007ffca1982dd0```
@bglod
Copy link

bglod commented Mar 5, 2019

I too have been seeing this for a while, also with txg_sync. This might be similar to #7038.

@Anteru
Copy link
Author

Anteru commented May 10, 2019

Did you have any luck getting this resolved? It makes me feel rather uneasy about our server.

@bglod
Copy link

bglod commented May 10, 2019

Nope. It still happens from time-to-time on my system. According to similar issues (#7038, #7425, others?) it's still a mystery.

@Anteru
Copy link
Author

Anteru commented May 17, 2019

Bummer, because it does seem to be related to amount of data written, I had this happen after a week now :(

@seanmikhaels
Copy link

Also having this issue. System runs fine for maybe an hour then hangs. Nothing crazy going on in the system, just an rsync transfer thats limited to 12 MB/s, and a docker container with zoneminder running 1 stream @ 1mbps. Tgx_sync freezes, as well as every other process running on zfs and only solution is to reboot. CPU utilization remains low as well as memory utilization.

Both compression and dedupe are enabled, ashift=12. 128KB record size.

OS - Debian Stretch
Kernel - 4.19.0-0.bpo.6-amd64
ZFS Version - 0.8.2-1
Pool: 3 Drive RaidZ1
L2ARC : 80GB SSD Drive.
System Memory: 16GB ARC. 32GB System Ram.

NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zfspool 10.9T 8.62T 2.25T - 28% 79% 1.47x ONLINE -


pool: zfspool
state: ONLINE
scan: resilvered 2.86T in 12h53m with 0 errors on Wed Nov 13 07:22:09 2019
config:

NAME                                 STATE     READ WRITE CKSUM
zfspool                              ONLINE       0     0     0
  raidz1-0                           ONLINE       0     0     0
    sde                              ONLINE       0     0     0
    ata-ST4000VN000-1H4168_Z301425Y  ONLINE       0     0     0
    ata-ST4000DM005-2DP166_ZDH164PY  ONLINE       0     0     0
logs
  sdf3                               ONLINE       0     0     0

errors: No known data errors

ZFS Subsystem Report Wed Nov 13 11:30:46 2019
ARC Summary: (HEALTHY)
Memory Throttle Count: 0

ARC Misc:
Deleted: 649.80k
Mutex Misses: 21
Evict Skips: 12

ARC Size: 98.85% 15.43 GiB
Target Size: (Adaptive) 100.00% 15.61 GiB
Min Size (Hard Limit): 6.25% 998.92 MiB
Max Size (High Water): 16:1 15.61 GiB

ARC Size Breakdown:
Recently Used Cache Size: 43.61% 4.38 GiB
Frequently Used Cache Size: 56.39% 5.66 GiB

ARC Hash Breakdown:
Elements Max: 1.20M
Elements Current: 89.82% 1.08M
Collisions: 1.07M
Chain Max: 5
Chains: 117.18k

ARC Total accesses: 497.41M
Cache Hit Ratio: 99.73% 496.07M
Cache Miss Ratio: 0.27% 1.34M
Actual Hit Ratio: 99.73% 496.06M

Data Demand Efficiency:		99.17%	433.01k
Data Prefetch Efficiency:	8.47%	2.87k

CACHE HITS BY CACHE LIST:
  Most Recently Used:		2.84%	14.08M
  Most Frequently Used:		97.16%	481.98M
  Most Recently Used Ghost:	0.05%	262.03k
  Most Frequently Used Ghost:	0.00%	11.54k

CACHE HITS BY DATA TYPE:
  Demand Data:			0.09%	429.42k
  Prefetch Data:		0.00%	243
  Demand Metadata:		99.91%	495.63M
  Prefetch Metadata:		0.00%	15.26k

CACHE MISSES BY DATA TYPE:
  Demand Data:			0.27%	3.59k
  Prefetch Data:		0.20%	2.63k
  Demand Metadata:		82.28%	1.10M
  Prefetch Metadata:		17.26%	231.71k

DMU Prefetch Efficiency: 105.46M
Hit Ratio: 0.08% 81.57k
Miss Ratio: 99.92% 105.38M


cat /proc/spl/kstat/zfs/dmu_tx
11 1 0x01 12 3264 36614795947 60113570602791
name type data
dmu_tx_assigned 4 4137795
dmu_tx_delay 4 0
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_dirty_delay 4 132825
dmu_tx_dirty_over_max 4 28
dmu_tx_dirty_frees_delay 4 1
dmu_tx_quota 4 0


Nov 13 11:23:32 nas kernel: [59329.141166] INFO: task txg_sync:1228 blocked for more than 120 seconds.
Nov 13 11:23:32 nas kernel: [59329.141368] Tainted: P OE 4.19.0-0.bpo.6-amd64 #1 Debian 4.19.67-2+deb10u1~bpo9+1
Nov 13 11:23:32 nas kernel: [59329.141598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 13 11:23:32 nas kernel: [59329.141827] txg_sync D 0 1228 2 0x80000000
Nov 13 11:23:32 nas kernel: [59329.141833] Call Trace:
Nov 13 11:23:32 nas kernel: [59329.141846] ? __schedule+0x3f5/0x880
Nov 13 11:23:32 nas kernel: [59329.141920] ? spa_taskq_dispatch_ent+0x85/0xc0 [zfs]
Nov 13 11:23:32 nas kernel: [59329.141924] schedule+0x32/0x80
Nov 13 11:23:32 nas kernel: [59329.141927] schedule_timeout+0x16f/0x350
Nov 13 11:23:32 nas kernel: [59329.141931] ? __next_timer_interrupt+0xc0/0xc0
Nov 13 11:23:32 nas kernel: [59329.141934] io_schedule_timeout+0x19/0x40
Nov 13 11:23:32 nas kernel: [59329.141942] __cv_timedwait_common+0x122/0x160 [spl]
Nov 13 11:23:32 nas kernel: [59329.141947] ? remove_wait_queue+0x60/0x60
Nov 13 11:23:32 nas kernel: [59329.142006] zio_wait+0x124/0x260 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142060] dsl_pool_sync+0x106/0x520 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142118] spa_sync+0x550/0xf90 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142177] ? spa_txg_history_init_io+0xfe/0x110 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142235] txg_sync_thread+0x291/0x450 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142331] ? txg_wait_open+0xd0/0xd0 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142350] thread_generic_wrapper+0x6f/0x80 [spl]
Nov 13 11:23:32 nas kernel: [59329.142353] kthread+0xf8/0x130
Nov 13 11:23:32 nas kernel: [59329.142360] ? __thread_exit+0x20/0x20 [spl]
Nov 13 11:23:32 nas kernel: [59329.142361] ? kthread_create_worker_on_cpu+0x70/0x70
Nov 13 11:23:32 nas kernel: [59329.142364] ret_from_fork+0x22/0x40


@lightrush
Copy link

Happens consistently in my environment. Happened after transferring about 1TB of data via Syncthing. Twice, one after another. Then it happened during rm -rf of the copied data.

[Tue Jun 30 17:10:03 2020] INFO: task z_wr_iss:1597 blocked for more than 120 seconds.
[Tue Jun 30 17:10:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:10:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:10:04 2020] z_wr_iss        D    0  1597      2 0x00000028
[Tue Jun 30 17:10:04 2020] Call trace:
[Tue Jun 30 17:10:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:10:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:10:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:10:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:10:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:10:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:10:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:10:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:10:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:10:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:10:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_queue_io+0x23c/0x258 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:10:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:10:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:10:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:10:04 2020] INFO: task z_wr_iss:1598 blocked for more than 120 seconds.
[Tue Jun 30 17:10:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:10:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:10:04 2020] z_wr_iss        D    0  1598      2 0x00000028
[Tue Jun 30 17:10:04 2020] Call trace:
[Tue Jun 30 17:10:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:10:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:10:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:10:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:10:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:10:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:10:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:10:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:10:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:10:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:10:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:10:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:10:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:10:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:10:04 2020] INFO: task z_wr_int:1606 blocked for more than 120 seconds.
[Tue Jun 30 17:10:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:10:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:10:04 2020] z_wr_int        D    0  1606      2 0x00000028
[Tue Jun 30 17:10:04 2020] Call trace:
[Tue Jun 30 17:10:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:10:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:10:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:10:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:10:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:10:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:10:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:10:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:10:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:10:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:10:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:10:04 2020]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_vdev_io_done+0xec/0x220 [zfs]
[Tue Jun 30 17:10:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:10:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:10:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:10:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:10:04 2020] INFO: task txg_sync:2423 blocked for more than 120 seconds.
[Tue Jun 30 17:10:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:10:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:10:04 2020] txg_sync        D    0  2423      2 0x00000028
[Tue Jun 30 17:10:04 2020] Call trace:
[Tue Jun 30 17:10:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:10:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:10:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:10:04 2020]  schedule_timeout+0x9c/0x190
[Tue Jun 30 17:10:04 2020]  io_schedule_timeout+0x28/0x48
[Tue Jun 30 17:10:04 2020]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[Tue Jun 30 17:10:04 2020]  __cv_timedwait_io+0x3c/0x50 [spl]
[Tue Jun 30 17:10:04 2020]  zio_wait+0x130/0x2a0 [zfs]
[Tue Jun 30 17:10:04 2020]  dsl_pool_sync+0x3fc/0x498 [zfs]
[Tue Jun 30 17:10:04 2020]  spa_sync+0x538/0xe68 [zfs]
[Tue Jun 30 17:10:04 2020]  txg_sync_thread+0x2c0/0x468 [zfs]
[Tue Jun 30 17:10:04 2020]  thread_generic_wrapper+0x74/0xa0 [spl]
[Tue Jun 30 17:10:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:10:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:10:04 2020] INFO: task rm:16225 blocked for more than 120 seconds.
[Tue Jun 30 17:10:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:10:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:10:04 2020] rm              D    0 16225  16224 0x00000000
[Tue Jun 30 17:10:04 2020] Call trace:
[Tue Jun 30 17:10:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:10:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:10:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:10:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:10:04 2020]  cv_wait_common+0x104/0x1b0 [spl]
[Tue Jun 30 17:10:04 2020]  __cv_wait_io+0x30/0x40 [spl]
[Tue Jun 30 17:10:04 2020]  txg_wait_open+0xa0/0x118 [zfs]
[Tue Jun 30 17:10:04 2020]  dmu_free_long_range+0x400/0x4e0 [zfs]
[Tue Jun 30 17:10:04 2020]  zfs_rmnode+0x244/0x338 [zfs]
[Tue Jun 30 17:10:04 2020]  zfs_zinactive+0x12c/0x148 [zfs]
[Tue Jun 30 17:10:04 2020]  zfs_inactive+0x78/0x1f8 [zfs]
[Tue Jun 30 17:10:04 2020]  zpl_evict_inode+0x4c/0x70 [zfs]
[Tue Jun 30 17:10:04 2020]  evict+0xcc/0x1c8
[Tue Jun 30 17:10:04 2020]  iput+0x158/0x250
[Tue Jun 30 17:10:04 2020]  do_unlinkat+0x1bc/0x2a0
[Tue Jun 30 17:10:04 2020]  __arm64_sys_unlinkat+0x44/0x70
[Tue Jun 30 17:10:04 2020]  el0_svc_common.constprop.0+0xe0/0x1e8
[Tue Jun 30 17:10:04 2020]  el0_svc_handler+0x34/0xa0
[Tue Jun 30 17:10:04 2020]  el0_svc+0x10/0x14
[Tue Jun 30 17:12:04 2020] INFO: task z_wr_iss:1597 blocked for more than 241 seconds.
[Tue Jun 30 17:12:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:12:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:12:04 2020] z_wr_iss        D    0  1597      2 0x00000028
[Tue Jun 30 17:12:04 2020] Call trace:
[Tue Jun 30 17:12:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:12:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:12:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:12:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:12:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:12:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:12:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:12:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:12:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:12:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:12:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_queue_io+0x23c/0x258 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:12:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:12:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:12:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:12:04 2020] INFO: task z_wr_iss:1598 blocked for more than 241 seconds.
[Tue Jun 30 17:12:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:12:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:12:04 2020] z_wr_iss        D    0  1598      2 0x00000028
[Tue Jun 30 17:12:04 2020] Call trace:
[Tue Jun 30 17:12:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:12:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:12:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:12:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:12:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:12:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:12:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:12:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:12:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:12:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:12:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:12:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:12:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:12:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:12:04 2020] INFO: task z_wr_int:1606 blocked for more than 241 seconds.
[Tue Jun 30 17:12:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:12:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:12:04 2020] z_wr_int        D    0  1606      2 0x00000028
[Tue Jun 30 17:12:04 2020] Call trace:
[Tue Jun 30 17:12:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:12:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:12:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:12:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:12:04 2020]  rq_qos_wait+0x100/0x178
[Tue Jun 30 17:12:04 2020]  wbt_wait+0xb4/0xf0
[Tue Jun 30 17:12:04 2020]  __rq_qos_throttle+0x38/0x50
[Tue Jun 30 17:12:04 2020]  blk_mq_make_request+0x128/0x610
[Tue Jun 30 17:12:04 2020]  generic_make_request+0xb4/0x2d8
[Tue Jun 30 17:12:04 2020]  submit_bio+0x48/0x218
[Tue Jun 30 17:12:04 2020]  vdev_disk_io_start+0x670/0x9f0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_nowait+0xd4/0x170 [zfs]
[Tue Jun 30 17:12:04 2020]  vdev_queue_io_done+0x1ec/0x2a0 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_vdev_io_done+0xec/0x220 [zfs]
[Tue Jun 30 17:12:04 2020]  zio_execute+0xac/0x110 [zfs]
[Tue Jun 30 17:12:04 2020]  taskq_thread+0x2f8/0x570 [spl]
[Tue Jun 30 17:12:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:12:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:12:04 2020] INFO: task txg_sync:2423 blocked for more than 241 seconds.
[Tue Jun 30 17:12:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:12:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:12:04 2020] txg_sync        D    0  2423      2 0x00000028
[Tue Jun 30 17:12:04 2020] Call trace:
[Tue Jun 30 17:12:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:12:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:12:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:12:04 2020]  schedule_timeout+0x9c/0x190
[Tue Jun 30 17:12:04 2020]  io_schedule_timeout+0x28/0x48
[Tue Jun 30 17:12:04 2020]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[Tue Jun 30 17:12:04 2020]  __cv_timedwait_io+0x3c/0x50 [spl]
[Tue Jun 30 17:12:04 2020]  zio_wait+0x130/0x2a0 [zfs]
[Tue Jun 30 17:12:04 2020]  dsl_pool_sync+0x3fc/0x498 [zfs]
[Tue Jun 30 17:12:04 2020]  spa_sync+0x538/0xe68 [zfs]
[Tue Jun 30 17:12:04 2020]  txg_sync_thread+0x2c0/0x468 [zfs]
[Tue Jun 30 17:12:04 2020]  thread_generic_wrapper+0x74/0xa0 [spl]
[Tue Jun 30 17:12:04 2020]  kthread+0xfc/0x128
[Tue Jun 30 17:12:04 2020]  ret_from_fork+0x10/0x1c
[Tue Jun 30 17:12:04 2020] INFO: task rm:16225 blocked for more than 241 seconds.
[Tue Jun 30 17:12:04 2020]       Tainted: P         C OE     5.4.0-1012-raspi #12-Ubuntu
[Tue Jun 30 17:12:04 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 30 17:12:04 2020] rm              D    0 16225  16224 0x00000000
[Tue Jun 30 17:12:04 2020] Call trace:
[Tue Jun 30 17:12:04 2020]  __switch_to+0x104/0x170
[Tue Jun 30 17:12:04 2020]  __schedule+0x30c/0x7c0
[Tue Jun 30 17:12:04 2020]  schedule+0x3c/0xb8
[Tue Jun 30 17:12:04 2020]  io_schedule+0x20/0x40
[Tue Jun 30 17:12:04 2020]  cv_wait_common+0x104/0x1b0 [spl]
[Tue Jun 30 17:12:04 2020]  __cv_wait_io+0x30/0x40 [spl]
[Tue Jun 30 17:12:04 2020]  txg_wait_open+0xa0/0x118 [zfs]
[Tue Jun 30 17:12:04 2020]  dmu_free_long_range+0x400/0x4e0 [zfs]
[Tue Jun 30 17:12:04 2020]  zfs_rmnode+0x244/0x338 [zfs]
[Tue Jun 30 17:12:04 2020]  zfs_zinactive+0x12c/0x148 [zfs]
[Tue Jun 30 17:12:04 2020]  zfs_inactive+0x78/0x1f8 [zfs]
[Tue Jun 30 17:12:04 2020]  zpl_evict_inode+0x4c/0x70 [zfs]
[Tue Jun 30 17:12:04 2020]  evict+0xcc/0x1c8
[Tue Jun 30 17:12:04 2020]  iput+0x158/0x250
[Tue Jun 30 17:12:04 2020]  do_unlinkat+0x1bc/0x2a0
[Tue Jun 30 17:12:04 2020]  __arm64_sys_unlinkat+0x44/0x70
[Tue Jun 30 17:12:04 2020]  el0_svc_common.constprop.0+0xe0/0x1e8
[Tue Jun 30 17:12:04 2020]  el0_svc_handler+0x34/0xa0
[Tue Jun 30 17:12:04 2020]  el0_svc+0x10/0x14

@ironMann
Copy link
Contributor

@Anteru Any luck resolving or mitigating this? We see the same problem from time to time in production, Ubuntu zfs v0.7.12-1ubuntu, kernel 5.0.0.
All zio threads appear to be blocked on rw_enter(&dn->dn_struct_rwlock, RW_WRITER);

@Anteru
Copy link
Author

Anteru commented Sep 18, 2020

For mitigation I try to reboot the server every two months, but since 20.04 I haven't seen it, so maybe that fixed it? I think that upgrades to ZFS 0.8.

@stale
Copy link

stale bot commented Sep 19, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Sep 19, 2021
@purpledrazi
Copy link

I saw what I think is the same issue last night.

Running Debian 11, zfs version 2.0.3-9.

I seemed to be able to browse the filesystem still, but the NFS server was not working, and KVM VMs that were running were not accepting connections. Needed to reboot the server to get back to normal.

In syslog, multiple hung tasks reported, with the same stack trace. Example:

INFO: task z_wr_int:1495 blocked for more than 120 seconds.
      Tainted: P           OE     5.10.0-8-amd64 #1 Debian 5.10.46-4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:z_wr_int        state:D stack:    0 pid: 1495 ppid:     2 flags:0x00004000
Call Trace:
 __schedule+0x282/0x870
 ? wbt_rqw_done+0xe0/0xe0
 schedule+0x46/0xb0
 io_schedule+0x42/0x70
 rq_qos_wait+0x106/0x170
 ? karma_partition+0x1e0/0x1e0
 ? wbt_cleanup_cb+0x20/0x20
 wbt_wait+0x9f/0x130
 __rq_qos_throttle+0x23/0x30
 blk_mq_submit_bio+0x124/0x520
 submit_bio_noacct+0x3a9/0x420
 vdev_disk_io_start+0x5a7/0xa50 [zfs]
 zio_vdev_io_start+0xec/0x320 [zfs]
 zio_nowait+0xb4/0x1a0 [zfs]
 vdev_queue_io_done+0x215/0x2a0 [zfs]
 zio_vdev_io_done+0xc7/0x1f0 [zfs]
 zio_execute+0x81/0x120 [zfs]
 taskq_thread+0x2da/0x520 [spl]
 ? wake_up_q+0xa0/0xa0
 ? zio_destroy+0xf0/0xf0 [zfs]
 ? taskq_thread_spawn+0x50/0x50 [spl]
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

@stale stale bot removed the Status: Stale No recent activity for issue label Sep 23, 2021
@stale
Copy link

stale bot commented Sep 24, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Sep 24, 2022
@stale stale bot closed this as completed Dec 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

6 participants