ZFS hang on tgx_sync #542

lkishalmi · 2012-01-26T10:59:40Z

We try to use ZFS on our build machine. The machine is a VMWare virtual machine on esx server.

earlier: ZFS: Loaded module v0.6.0.47, ZFS pool version 28, ZFS filesystem version 5

[86021.778001] INFO: task txg_sync:1111 blocked for more than 120 seconds.
[86021.778103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[86021.778252] txg_sync        D ffffffff81805120     0  1111      2 0x00000000
[86021.778256]  ffff88020db0fc00 0000000000000046 ffff88020db0fba0 ffffffff810329a9
[86021.778258]  ffff88020db0ffd8 ffff88020db0ffd8 ffff88020db0ffd8 0000000000012a40
[86021.778260]  ffff880232d65c80 ffff880225be2e40 0000000000000282 ffff880086967aa0
[86021.778262] Call Trace:
[86021.778268]  [<ffffffff810329a9>] ? default_spin_lock_flags+0x9/0x10
[86021.778278]  [<ffffffffa00d1c57>] cv_wait_common+0x77/0xd0 [spl]
[86021.778282]  [<ffffffff81081700>] ? add_wait_queue+0x60/0x60
[86021.778286]  [<ffffffffa00d1ce3>] __cv_wait+0x13/0x20 [spl]
[86021.778310]  [<ffffffffa0202d9b>] zio_wait+0xfb/0x170 [zfs]
[86021.778324]  [<ffffffffa019f95c>] dsl_pool_sync+0x17c/0x450 [zfs]
[86021.778340]  [<ffffffffa01b2b5e>] spa_sync+0x38e/0xa00 [zfs]
[86021.778342]  [<ffffffff81057352>] ? default_wake_function+0x12/0x20
[86021.778344]  [<ffffffff81081716>] ? autoremove_wake_function+0x16/0x40
[86021.778346]  [<ffffffff810465a3>] ? __wake_up+0x53/0x70
[86021.778362]  [<ffffffffa01c2716>] txg_sync_thread+0x216/0x390 [zfs]
[86021.778378]  [<ffffffffa01c2500>] ? txg_init+0x260/0x260 [zfs]
[86021.778395]  [<ffffffffa01c2500>] ? txg_init+0x260/0x260 [zfs]
[86021.778401]  [<ffffffffa00cdf08>] thread_generic_wrapper+0x78/0x90 [spl]
[86021.778405]  [<ffffffffa00cde90>] ? __thread_create+0x160/0x160 [spl]
[86021.778408]  [<ffffffff81080c5c>] kthread+0x8c/0xa0
[86021.778413]  [<ffffffff81607d24>] kernel_thread_helper+0x4/0x10
[86021.778415]  [<ffffffff81080bd0>] ? flush_kthread_worker+0xa0/0xa0
[86021.778418]  [<ffffffff81607d20>] ? gs_change+0x13/0x13

The text was updated successfully, but these errors were encountered:

lkishalmi · 2012-01-26T12:01:15Z

It happens on Ubuntu Oneiric x64.

Linux localhost.loaclhost 3.0.0-12-server #20-Ubuntu SMP Fri Oct 7 16:36:30 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

behlendorf · 2012-01-26T18:22:14Z

The stack is just advisory, are you observing any other issues?

RLovelett · 2012-01-28T17:17:47Z

I'm having the same thing.

Other issues that arise from this is whatever the currently running process (mv, cp, dd, etc ...) all go into a 'uninterruptible' state (as well as txg_sync) and hang. What other info can I provide to help track this down?

Linux sterling-cooper 3.0.0-15-server #26-Ubuntu SMP Fri Jan 20 19:07:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

(yes I name my computers on my home network after Mad Men characters)

lkishalmi · 2012-01-29T15:33:31Z

Maybe it could be related something with this:
08d08eb

I've just upgraded the system to a ZFS version containing this one and from that time this problem has not occured yet. Nothing is sure yet. I'll keep watching it.

RLovelett · 2012-01-29T20:21:57Z

@lkishalmi If I use the daily build PPA this fix should be in there?

lkishalmi · 2012-01-29T22:03:31Z

I think it does. I have v0.6.0.48 on that server. It has now more than 3
days uptime (Which is promising compared to one fail per day.) I just
run a large delete with 7 Gb data and 130000 files and directories.
There is no problem so far.

On 2012-01-29 21:21, Ryan Lovelett wrote:

@lkishalmi If I use the daily build PPA this fix should be in there?

Reply to this email directly or view it on GitHub:
#542 (comment)

RLovelett · 2012-01-29T22:34:25Z

Ok thanks for the info I will give it a try.

RLovelett · 2012-01-30T05:01:07Z

Just to give back-ground the hardware is: http://reviews.cnet.com/soho-servers/acer-aspire-easystore-h340/4507-3125_7-33707300.html?tag=mncol;subnav

The OS was given above.

Here are some sample logs from /var/log/syslog the hang-up is happening when I mount my ZFS pool.

Jan 29 21:10:57 sterling-cooper kernel: [ 4240.973572] SPL: Loaded module v0.6.0.48, using hostid 0x007f0101
Jan 29 21:10:57 sterling-cooper kernel: [ 4240.974306] zunicode: module license 'CDDL' taints kernel.
Jan 29 21:10:57 sterling-cooper kernel: [ 4240.974315] Disabling lock debugging due to kernel taint
Jan 29 21:10:57 sterling-cooper kernel: [ 4241.102458] ZFS: Loaded module v0.6.0.48, ZFS pool version 28, ZFS filesystem version 5

Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376077] INFO: task txg_sync:809 blocked for more than 120 seconds.
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376235] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376411] txg_sync        D 0000000000000001     0   809      2 0x00000000
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376428]  ffff8800738e9c70 0000000000000046 ffff8800738e9c10 ffffffff810573b2
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376444]  ffff8800738e9fd8 ffff8800738e9fd8 ffff8800738e9fd8 0000000000012a40
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376459]  ffff880075cbdc80 ffff88007aaa9720 ffff8800738e9c80 ffff880079339730
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376474] Call Trace:
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376495]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376512]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376543]  [<ffffffffa018cc27>] cv_wait_common+0x77/0xd0 [spl]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376558]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376583]  [<ffffffffa018ccb3>] __cv_wait+0x13/0x20 [spl]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376673]  [<ffffffffa04c0d9b>] zio_wait+0xfb/0x170 [zfs]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376747]  [<ffffffffa0470a22>] spa_sync+0x252/0xa00 [zfs]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376762]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376775]  [<ffffffff81081676>] ? autoremove_wake_function+0x16/0x40
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376790]  [<ffffffff81044b08>] ? __wake_up_common+0x58/0x90
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376803]  [<ffffffff810465e3>] ? __wake_up+0x53/0x70
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376876]  [<ffffffffa0480716>] txg_sync_thread+0x216/0x390 [zfs]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.376950]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377023]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377049]  [<ffffffffa0188f08>] thread_generic_wrapper+0x78/0x90 [spl]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377073]  [<ffffffffa0188e90>] ? __thread_create+0x160/0x160 [spl]
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377087]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377101]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377115]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 29 22:46:34 sterling-cooper kernel: [ 4800.377127]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 29 22:47:33 sterling-cooper kernel: [ 4859.236006] INFO: rcu_sched_state detected stall on CPU 0 (t=60031 jiffies)
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376074] INFO: task txg_sync:809 blocked for more than 120 seconds.
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376408] txg_sync        D 0000000000000001     0   809      2 0x00000000
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376426]  ffff8800738e9c70 0000000000000046 ffff8800738e9c10 ffffffff810573b2
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376442]  ffff8800738e9fd8 ffff8800738e9fd8 ffff8800738e9fd8 0000000000012a40
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376457]  ffff880075cbdc80 ffff88007aaa9720 ffff8800738e9c80 ffff880079339730
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376472] Call Trace:
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376492]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376509]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376541]  [<ffffffffa018cc27>] cv_wait_common+0x77/0xd0 [spl]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376556]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376581]  [<ffffffffa018ccb3>] __cv_wait+0x13/0x20 [spl]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376670]  [<ffffffffa04c0d9b>] zio_wait+0xfb/0x170 [zfs]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376743]  [<ffffffffa0470a22>] spa_sync+0x252/0xa00 [zfs]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376758]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376772]  [<ffffffff81081676>] ? autoremove_wake_function+0x16/0x40
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376786]  [<ffffffff81044b08>] ? __wake_up_common+0x58/0x90
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376799]  [<ffffffff810465e3>] ? __wake_up+0x53/0x70
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376873]  [<ffffffffa0480716>] txg_sync_thread+0x216/0x390 [zfs]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.376948]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377020]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377046]  [<ffffffffa0188f08>] thread_generic_wrapper+0x78/0x90 [spl]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377071]  [<ffffffffa0188e90>] ? __thread_create+0x160/0x160 [spl]
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377085]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377099]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377113]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 29 22:48:34 sterling-cooper kernel: [ 4920.377125]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 29 22:50:33 sterling-cooper kernel: [ 5039.360006] INFO: rcu_sched_state detected stall on CPU 0 (t=105062 jiffies)
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376073] INFO: task txg_sync:809 blocked for more than 120 seconds.
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376407] txg_sync        D 0000000000000001     0   809      2 0x00000000
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376424]  ffff8800738e9c70 0000000000000046 ffff8800738e9c10 ffffffff810573b2
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376440]  ffff8800738e9fd8 ffff8800738e9fd8 ffff8800738e9fd8 0000000000012a40
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376455]  ffff880075cbdc80 ffff88007aaa9720 ffff8800738e9c80 ffff880079339730
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376470] Call Trace:
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376490]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376507]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376539]  [<ffffffffa018cc27>] cv_wait_common+0x77/0xd0 [spl]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376554]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376579]  [<ffffffffa018ccb3>] __cv_wait+0x13/0x20 [spl]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376669]  [<ffffffffa04c0d9b>] zio_wait+0xfb/0x170 [zfs]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376741]  [<ffffffffa0470a22>] spa_sync+0x252/0xa00 [zfs]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376756]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376770]  [<ffffffff81081676>] ? autoremove_wake_function+0x16/0x40
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376785]  [<ffffffff81044b08>] ? __wake_up_common+0x58/0x90
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376798]  [<ffffffff810465e3>] ? __wake_up+0x53/0x70
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376873]  [<ffffffffa0480716>] txg_sync_thread+0x216/0x390 [zfs]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.376946]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377018]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377045]  [<ffffffffa0188f08>] thread_generic_wrapper+0x78/0x90 [spl]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377069]  [<ffffffffa0188e90>] ? __thread_create+0x160/0x160 [spl]
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377082]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377096]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377110]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 29 22:50:34 sterling-cooper kernel: [ 5040.377123]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376075] INFO: task txg_sync:809 blocked for more than 120 seconds.
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376409] txg_sync        D 0000000000000001     0   809      2 0x00000000
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376427]  ffff8800738e9c70 0000000000000046 ffff8800738e9c10 ffffffff810573b2
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376443]  ffff8800738e9fd8 ffff8800738e9fd8 ffff8800738e9fd8 0000000000012a40
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376458]  ffff880075cbdc80 ffff88007aaa9720 ffff8800738e9c80 ffff880079339730
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376473] Call Trace:
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376494]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376510]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376542]  [<ffffffffa018cc27>] cv_wait_common+0x77/0xd0 [spl]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376557]  [<ffffffff81081660>] ? add_wait_queue+0x60/0x60
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376583]  [<ffffffffa018ccb3>] __cv_wait+0x13/0x20 [spl]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376673]  [<ffffffffa04c0d9b>] zio_wait+0xfb/0x170 [zfs]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376746]  [<ffffffffa0470a22>] spa_sync+0x252/0xa00 [zfs]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376761]  [<ffffffff810573b2>] ? default_wake_function+0x12/0x20
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376775]  [<ffffffff81081676>] ? autoremove_wake_function+0x16/0x40
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376791]  [<ffffffff81044b08>] ? __wake_up_common+0x58/0x90
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376804]  [<ffffffff810465e3>] ? __wake_up+0x53/0x70
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376878]  [<ffffffffa0480716>] txg_sync_thread+0x216/0x390 [zfs]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.376952]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377025]  [<ffffffffa0480500>] ? txg_init+0x260/0x260 [zfs]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377051]  [<ffffffffa0188f08>] thread_generic_wrapper+0x78/0x90 [spl]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377076]  [<ffffffffa0188e90>] ? __thread_create+0x160/0x160 [spl]
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377090]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377104]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377118]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 29 22:52:34 sterling-cooper kernel: [ 5160.377130]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 29 22:53:34 sterling-cooper kernel: [ 5219.484006] INFO: rcu_sched_state detected stall on CPU 0 (t=150093 jiffies)

behlendorf · 2012-03-16T04:19:04Z

Is this issue still being observed with that latest PPA? There have been several fixes which may address this.

lkishalmi · 2012-03-16T19:39:17Z

I can´t say it for sure. We experienced this problem only once since upgrading to 0.6.0.48

behlendorf · 2012-03-16T19:44:34Z

Alright, then if you don't mind I'd like to close this issue. Please feel free to open a new issue if you ever observe this or any other failure. We absolutely want to get them all resolved.

ryanolf · 2012-03-26T18:32:56Z

I believe I am experiencing this issue. I am transferring a large amount of data off a zfs raidz2 (~1.5 TB), and after several hours I see these errors and then the system locks up. I am running the latest stable ppa.

Here is the output from my kernel.log

Mar 26 00:24:34 oxford kernel: [24241.244476] INFO: task zfs:21947 blocked for more than 120 second
s.
Mar 26 00:24:34 oxford kernel: [24241.244500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di
sables this message.
Mar 26 00:24:34 oxford kernel: [24241.244528] zfs             D ffffffff81805120     0 21947  21946
 0x00000000
Mar 26 00:24:34 oxford kernel: [24241.244532]  ffff8800a534fc38 0000000000000086 0000000000000000 0
000000000000000
Mar 26 00:24:34 oxford kernel: [24241.244535]  ffff8800a534ffd8 ffff8800a534ffd8 ffff8800a534ffd8 0
000000000012a40
Mar 26 00:24:34 oxford kernel: [24241.244539]  ffff880232d28000 ffff88022897c560 ffff8800a534fc48 f
fff880229d59268
Mar 26 00:24:34 oxford kernel: [24241.244542] Call Trace:
Mar 26 00:24:34 oxford kernel: [24241.244544]  [<ffffffff816058cf>] schedule+0x3f/0x60
Mar 26 00:24:34 oxford kernel: [24241.244551]  [<ffffffffa01aa6b8>] cv_wait_common+0x98/0x190 [spl]
Mar 26 00:24:34 oxford kernel: [24241.244554]  [<ffffffff810466f3>] ? __wake_up+0x53/0x70
Mar 26 00:24:34 oxford kernel: [24241.244557]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:24:34 oxford kernel: [24241.244563]  [<ffffffffa01aa7e3>] __cv_wait+0x13/0x20 [spl]
Mar 26 00:24:34 oxford kernel: [24241.244585]  [<ffffffffa029b303>] txg_wait_synced+0xb3/0x190 [zfs
]
Mar 26 00:24:34 oxford kernel: [24241.244606]  [<ffffffffa027fe6e>] dsl_sync_task_group_wait+0x14e/
0x270 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244624]  [<ffffffffa026bc60>] ? snaplist_destroy+0x100/0x100 
[zfs]
Mar 26 00:24:34 oxford kernel: [24241.244642]  [<ffffffffa026f4a0>] ? dsl_dataset_create_sync+0x280
/0x280 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244662]  [<ffffffffa0280167>] dsl_sync_task_do+0x57/0x80 [zfs
]
Mar 26 00:24:34 oxford kernel: [24241.244681]  [<ffffffffa0271302>] dsl_dataset_destroy+0xb2/0x490 
[zfs]
Mar 26 00:24:34 oxford kernel: [24241.244698]  [<ffffffffa025980b>] dmu_objset_destroy+0x3b/0x50 [z
fs]
Mar 26 00:24:34 oxford kernel: [24241.244719]  [<ffffffffa02bca23>] zfs_ioc_destroy+0x23/0x60 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244741]  [<ffffffffa02c087c>] zfsdev_ioctl+0xdc/0x1b0 [zfs]
Mar 26 00:24:34 oxford kernel: [24241.244744]  [<ffffffff8117a1a9>] do_vfs_ioctl+0x89/0x310
Mar 26 00:24:34 oxford kernel: [24241.244748]  [<ffffffff81134c73>] ? do_munmap+0x1f3/0x2f0
Mar 26 00:24:34 oxford kernel: [24241.244750]  [<ffffffff8117a4c1>] sys_ioctl+0x91/0xa0
Mar 26 00:24:34 oxford kernel: [24241.244754]  [<ffffffff8160fb82>] system_call_fastpath+0x16/0x1b

This is followed by:

Mar 26 00:26:34 oxford kernel: [24361.244105] INFO: task txg_sync:3273 blocked for more than 120 se
conds.
Mar 26 00:26:34 oxford kernel: [24361.244165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di
sables this message.
Mar 26 00:26:34 oxford kernel: [24361.244225] txg_sync        D 0000000000000000     0  3273      0
x0000000000
Mar 26 00:26:34 oxford kernel: [24361.244233]  ffff88020e579b90 0000000000000046 ffff88020e579b30 f
fffffff810574c2
Mar 26 00:26:34 oxford kernel: [24361.244241]  ffff88020e579fd8 ffff88020e579fd8 ffff88020e579fd8 0
000000000012a40
Mar 26 00:26:34 oxford kernel: [24361.244248]  ffff88022bbe9720 ffff88020fdc2e40 ffff88020e579ba0 f
fff8800594922e8
Mar 26 00:26:34 oxford kernel: [24361.244255] Call Trace:
Mar 26 00:26:34 oxford kernel: [24361.244266]  [<ffffffff810574c2>] ? default_wake_function+0x12/0x
20
Mar 26 00:26:34 oxford kernel: [24361.244274]  [<ffffffff816058cf>] schedule+0x3f/0x60
Mar 26 00:26:34 oxford kernel: [24361.244302]  [<ffffffffa01aa6b8>] cv_wait_common+0x98/0x190 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244309]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:26:34 oxford kernel: [24361.244323]  [<ffffffffa01aa7e3>] __cv_wait+0x13/0x20 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244383]  [<ffffffffa02dcc9b>] zio_wait+0xfb/0x170 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244427]  [<ffffffffa027f5e1>] dsl_scan_sync+0x481/0x8e0 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244433]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 00:26:34 oxford kernel: [24361.244479]  [<ffffffffa028bfa1>] spa_sync+0x3f1/0xa00 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244526]  [<ffffffffa029c006>] txg_sync_thread+0x286/0x450 [zf
s]
Mar 26 00:26:34 oxford kernel: [24361.244573]  [<ffffffffa029bd80>] ? txg_init+0x260/0x260 [zfs]
Mar 26 00:26:34 oxford kernel: [24361.244586]  [<ffffffffa01a2f88>] thread_generic_wrapper+0x78/0x9
0 [spl]
Mar 26 00:26:34 oxford kernel: [24361.244599]  [<ffffffffa01a2f10>] ? __thread_create+0x310/0x310 [
spl]
Mar 26 00:26:34 oxford kernel: [24361.244604]  [<ffffffff81080dac>] kthread+0x8c/0xa0
Mar 26 00:26:34 oxford kernel: [24361.244610]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
Mar 26 00:26:34 oxford kernel: [24361.244615]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa
0
Mar 26 00:26:34 oxford kernel: [24361.244620]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13

This was followed by the machine locking up (kernel panic?). These errors actually occurred several times in the kernel log before the lockup.

Here is more information about my system and version:

ryan@oxford:$ modinfo zfs
filename:       /lib/modules/3.0.0-16-server/updates/dkms/zfs.ko
license:        CDDL
author:         Sun Microsystems/Oracle, Lawrence Livermore National Laboratory
description:    ZFS
srcversion:     4F965FF03CFCE90153F3318
depends:        spl,znvpair,zcommon,zunicode,zavl
vermagic:       3.0.0-16-server SMP mod_unload modversions

and

ryan@oxford:$ sudo dpkg -l | grep -E "zfs-|spl-"
ii  spl-dkms                               0.6.0.54-0ubuntu1~oneiric1                 Solaris Porting Layer kernel modules for Linux
ii  zfs-dkms                               0.6.0.54-1ubuntu1~oneiric1                 Native ZFS filesystem kernel modules for Linux

stephane-chazelas · 2012-03-27T11:08:27Z

Maybe related to issue #610

Got some of those too (54 and 55)

Mar 26 21:32:27 cetautomatix kernel: [15602.368115] INFO: task txg_sync:784 blocked for more than 120 seconds.
Mar 26 21:32:27 cetautomatix kernel: [15602.374717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 26 21:32:27 cetautomatix kernel: [15602.382622] txg_sync        D 0000000000000002     0   784      2 0x00000000
Mar 26 21:32:27 cetautomatix kernel: [15602.382627]  ffff880203afbbd0 0000000000000046 ffff880203afbb70 ffffffff810574c2
Mar 26 21:32:27 cetautomatix kernel: [15602.382631]  ffff880203afbfd8 ffff880203afbfd8 ffff880203afbfd8 0000000000012a40
Mar 26 21:32:27 cetautomatix kernel: [15602.382641]  ffff88020744ae40 ffff880207550000 ffff880203afbbe0 ffff8801debdcb48
Mar 26 21:32:27 cetautomatix kernel: [15602.382653] Call Trace:
Mar 26 21:32:27 cetautomatix kernel: [15602.382666]  [<ffffffff810574c2>] ? default_wake_function+0x12/0x20
Mar 26 21:32:27 cetautomatix kernel: [15602.382677]  [<ffffffff816058cf>] schedule+0x3f/0x60
Mar 26 21:32:27 cetautomatix kernel: [15602.382704]  [<ffffffffa00c26c8>] cv_wait_common+0x98/0x190 [spl]
Mar 26 21:32:27 cetautomatix kernel: [15602.382712]  [<ffffffff81081850>] ? add_wait_queue+0x60/0x60
Mar 26 21:32:27 cetautomatix kernel: [15602.382722]  [<ffffffffa00c27f3>] __cv_wait+0x13/0x20 [spl]
Mar 26 21:32:27 cetautomatix kernel: [15602.382785]  [<ffffffffa01d7a6b>] zio_wait+0xfb/0x170 [zfs]
Mar 26 21:32:27 cetautomatix kernel: [15602.382811]  [<ffffffffa01711da>] dsl_pool_sync+0xca/0x450 [zfs]
Mar 26 21:32:27 cetautomatix kernel: [15602.382837]  [<ffffffffa0184ece>] spa_sync+0x38e/0xa00 [zfs]
Mar 26 21:32:27 cetautomatix kernel: [15602.382864]  [<ffffffffa0194f96>] txg_sync_thread+0x286/0x450 [zfs]
Mar 26 21:32:27 cetautomatix kernel: [15602.382892]  [<ffffffffa0194d10>] ? txg_init+0x260/0x260 [zfs]
Mar 26 21:32:27 cetautomatix kernel: [15602.382902]  [<ffffffffa00baf88>] thread_generic_wrapper+0x78/0x90 [spl]
Mar 26 21:32:27 cetautomatix kernel: [15602.382912]  [<ffffffffa00baf10>] ? __thread_create+0x310/0x310 [spl]
Mar 26 21:32:27 cetautomatix kernel: [15602.382919]  [<ffffffff81080dac>] kthread+0x8c/0xa0
Mar 26 21:32:27 cetautomatix kernel: [15602.382926]  [<ffffffff81610ca4>] kernel_thread_helper+0x4/0x10
Mar 26 21:32:27 cetautomatix kernel: [15602.382933]  [<ffffffff81080d20>] ? flush_kthread_worker+0xa0/0xa0
Mar 26 21:32:27 cetautomatix kernel: [15602.382939]  [<ffffffff81610ca0>] ? gs_change+0x13/0x13

Was fine in ppa-daily 51

stephane-chazelas · 2012-03-28T10:52:29Z

Got it with 51 as well now. Freeing some space on the zpool seems to have unstuck the situation though.

chrisrd · 2012-04-02T08:44:25Z

Getting similar traces here as well, with spl@8920c69, behlendorf/zfs@525c13f, linux-3.3.0

The system seems to have come back to normal after these.

[189076.785047] INFO: task txg_sync:13636 blocked for more than 120 seconds.
[189076.785106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189076.785189] txg_sync        D ffff8806d0061e50     0 13636      2 0x00000000
[189076.785248]  ffff8805fff21b60 0000000000000046 ffff880c231721e0 00000000001d38c0
[189076.785335]  ffff8805fff21fd8 ffff8805fff20010 00000000001d38c0 00000000001d38c0
[189076.785421]  ffff8805fff21fd8 00000000001d38c0 ffff880601ba21e0 ffff880c231721e0
[189076.785508] Call Trace:
[189076.785551]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189076.785610]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189076.785661]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189076.785713]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189076.785801]  [<ffffffffa053bd3b>] zio_wait+0x10b/0x190 [zfs]
[189076.785871]  [<ffffffffa04cf99a>] dsl_pool_sync+0xca/0x4a0 [zfs]
[189076.785943]  [<ffffffffa04e2e77>] spa_sync+0x397/0x9d0 [zfs]
[189076.785992]  [<ffffffff81097e65>] ? lock_release_holdtime+0x35/0x170
[189076.786067]  [<ffffffffa04f5371>] txg_sync_thread+0x2f1/0x4d0 [zfs]
[189076.786142]  [<ffffffffa04f5080>] ? txg_do_callbacks+0x60/0x60 [zfs]
[189076.786196]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189076.786251]  [<ffffffffa03e4508>] thread_generic_wrapper+0x78/0x90 [spl]
[189076.786307]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189076.786357]  [<ffffffff8106c9be>] kthread+0xae/0xc0
[189076.786404]  [<ffffffff813f2da4>] kernel_thread_helper+0x4/0x10
[189076.786454]  [<ffffffff8106c910>] ? __init_kthread_worker+0x70/0x70
[189076.786504]  [<ffffffff813f2da0>] ? gs_change+0x13/0x13
[189076.786549] no locks held by txg_sync/13636.

[189196.604047] INFO: task txg_sync:13636 blocked for more than 120 seconds.
[189196.604106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189196.604190] txg_sync        D ffff8806d0061e50     0 13636      2 0x00000000
[189196.604248]  ffff8805fff21b60 0000000000000046 ffff880c231721e0 00000000001d38c0
[189196.604335]  ffff8805fff21fd8 ffff8805fff20010 00000000001d38c0 00000000001d38c0
[189196.604422]  ffff8805fff21fd8 00000000001d38c0 ffff880601ba21e0 ffff880c231721e0
[189196.604509] Call Trace:
[189196.604552]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189196.604614]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189196.604665]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189196.604716]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189196.604800]  [<ffffffffa053bd3b>] zio_wait+0x10b/0x190 [zfs]
[189196.604870]  [<ffffffffa04cf99a>] dsl_pool_sync+0xca/0x4a0 [zfs]
[189196.604942]  [<ffffffffa04e2e77>] spa_sync+0x397/0x9d0 [zfs]
[189196.604991]  [<ffffffff81097e65>] ? lock_release_holdtime+0x35/0x170
[189196.605066]  [<ffffffffa04f5371>] txg_sync_thread+0x2f1/0x4d0 [zfs]
[189196.605141]  [<ffffffffa04f5080>] ? txg_do_callbacks+0x60/0x60 [zfs]
[189196.605196]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189196.605250]  [<ffffffffa03e4508>] thread_generic_wrapper+0x78/0x90 [spl]
[189196.605306]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189196.605357]  [<ffffffff8106c9be>] kthread+0xae/0xc0
[189196.605403]  [<ffffffff813f2da4>] kernel_thread_helper+0x4/0x10
[189196.605452]  [<ffffffff8106c910>] ? __init_kthread_worker+0x70/0x70
[189196.605502]  [<ffffffff813f2da0>] ? gs_change+0x13/0x13
[189196.605548] no locks held by txg_sync/13636.
[189196.605598] INFO: task rsync:16774 blocked for more than 120 seconds.
[189196.605647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189196.605724] rsync           D ffff880c1bfba660     0 16774  16771 0x00000000
[189196.605782]  ffff880489695778 0000000000000046 ffff88062356a1e0 00000000001d38c0
[189196.605868]  ffff880489695fd8 ffff880489694010 00000000001d38c0 00000000001d38c0
[189196.605955]  ffff880489695fd8 00000000001d38c0 ffff8806235d43c0 ffff88062356a1e0
[189196.606041] Call Trace:
[189196.606078]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189196.606128]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189196.606177]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189196.606229]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189196.606300]  [<ffffffffa04f4983>] txg_wait_open+0xa3/0x130 [zfs]
[189196.606367]  [<ffffffffa04b778d>] dmu_tx_wait+0x11d/0x120 [zfs]
[189196.606434]  [<ffffffffa04b786e>] dmu_tx_assign+0x8e/0x4a0 [zfs]
[189196.606501]  [<ffffffffa04aabe9>] dmu_free_long_range_impl+0x1b9/0x270 [zfs]
[189196.606571]  [<ffffffffa04aaceb>] dmu_free_long_range+0x4b/0x70 [zfs]
[189196.606622]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189196.606693]  [<ffffffffa052f55b>] zfs_trunc+0x6b/0x1f0 [zfs]
[189196.606741]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189196.606811]  [<ffffffffa04e08dc>] ? sa_lookup+0x6c/0x80 [zfs]
[189196.606884]  [<ffffffffa053150b>] zfs_freesp+0x34b/0x3c0 [zfs]
[189196.606933]  [<ffffffff813e99eb>] ? _raw_spin_unlock+0x2b/0x40
[189196.606986]  [<ffffffffa03edd7c>] ? tsd_exit+0x19c/0x1d0 [spl]
[189196.607060]  [<ffffffffa052c035>] ? zfs_lookup+0x265/0x340 [zfs]
[189196.607133]  [<ffffffffa0541869>] zpl_xattr_set_dir+0x109/0x220 [zfs]
[189196.607207]  [<ffffffffa05415f4>] ? __zpl_xattr_get+0xb4/0x220 [zfs]
[189196.607281]  [<ffffffffa0541ee1>] zpl_xattr_set+0x1b1/0x2a0 [zfs]
[189196.607354]  [<ffffffffa0542152>] zpl_xattr_user_set+0x92/0xb0 [zfs]
[189196.607406]  [<ffffffff8116c53e>] generic_setxattr+0x9e/0xb0
[189196.607455]  [<ffffffff8116d234>] __vfs_setxattr_noperm+0x64/0xf0
[189196.607504]  [<ffffffff8116d384>] vfs_setxattr+0xc4/0xd0
[189196.607551]  [<ffffffff8116d440>] setxattr+0xb0/0x120
[189196.607597]  [<ffffffff81155bf3>] ? putname+0x33/0x50
[189196.607642]  [<ffffffff8115a273>] ? user_path_at_empty+0x63/0xa0
[189196.607692]  [<ffffffff81168230>] ? vfsmount_lock_local_unlock_cpu+0x50/0x50
[189196.607744]  [<ffffffff81168266>] ? vfsmount_lock_local_unlock+0x36/0x40
[189196.607796]  [<ffffffff811685ed>] ? mntput_no_expire+0x3d/0x130
[189196.607844]  [<ffffffff8116d635>] sys_lsetxattr+0xa5/0xc0
[189196.607892]  [<ffffffff81206c99>] ? lockdep_sys_exit_thunk+0x35/0x67
[189196.607942]  [<ffffffff813f1aa9>] system_call_fastpath+0x16/0x1b
[189196.607991] 2 locks held by rsync/16774:
[189196.608031]  #0:  (&type->i_mutex_dir_key#4){......}, at: [<ffffffff8116d33d>] vfs_setxattr+0x7d/0xd0
[189196.608131]  #1:  (&zp->z_xattr_lock){......}, at: [<ffffffffa0541db2>] zpl_xattr_set+0x82/0x2a0 [zfs]

[189316.423013] INFO: task txg_sync:13636 blocked for more than 120 seconds.
[189316.423072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189316.423154] txg_sync        D ffff8806d0061e50     0 13636      2 0x00000000
[189316.423213]  ffff8805fff21b60 0000000000000046 ffff880c231721e0 00000000001d38c0
[189316.423300]  ffff8805fff21fd8 ffff8805fff20010 00000000001d38c0 00000000001d38c0
[189316.423387]  ffff8805fff21fd8 00000000001d38c0 ffff880601ba21e0 ffff880c231721e0
[189316.423474] Call Trace:
[189316.423517]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189316.423575]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189316.423626]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189316.423678]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189316.423763]  [<ffffffffa053bd3b>] zio_wait+0x10b/0x190 [zfs]
[189316.423833]  [<ffffffffa04cf99a>] dsl_pool_sync+0xca/0x4a0 [zfs]
[189316.423904]  [<ffffffffa04e2e77>] spa_sync+0x397/0x9d0 [zfs]
[189316.423953]  [<ffffffff81097e65>] ? lock_release_holdtime+0x35/0x170
[189316.424029]  [<ffffffffa04f5371>] txg_sync_thread+0x2f1/0x4d0 [zfs]
[189316.424104]  [<ffffffffa04f5080>] ? txg_do_callbacks+0x60/0x60 [zfs]
[189316.424158]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189316.424213]  [<ffffffffa03e4508>] thread_generic_wrapper+0x78/0x90 [spl]
[189316.424268]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189316.424319]  [<ffffffff8106c9be>] kthread+0xae/0xc0
[189316.424365]  [<ffffffff813f2da4>] kernel_thread_helper+0x4/0x10
[189316.424414]  [<ffffffff8106c910>] ? __init_kthread_worker+0x70/0x70
[189316.424464]  [<ffffffff813f2da0>] ? gs_change+0x13/0x13
[189316.424509] no locks held by txg_sync/13636.
[189316.424559] INFO: task rsync:16774 blocked for more than 120 seconds.
[189316.424607] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189316.424685] rsync           D ffff880c1bfba660     0 16774  16771 0x00000000
[189316.425954]  ffff880489695778 0000000000000046 ffff88062356a1e0 00000000001d38c0
[189316.426040]  ffff880489695fd8 ffff880489694010 00000000001d38c0 00000000001d38c0
[189316.426127]  ffff880489695fd8 00000000001d38c0 ffff8806235d43c0 ffff88062356a1e0
[189316.426214] Call Trace:
[189316.426250]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189316.426301]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189316.426350]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189316.426402]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189316.426474]  [<ffffffffa04f4983>] txg_wait_open+0xa3/0x130 [zfs]
[189316.426541]  [<ffffffffa04b778d>] dmu_tx_wait+0x11d/0x120 [zfs]
[189316.426608]  [<ffffffffa04b786e>] dmu_tx_assign+0x8e/0x4a0 [zfs]
[189316.426675]  [<ffffffffa04aabe9>] dmu_free_long_range_impl+0x1b9/0x270 [zfs]
[189316.426745]  [<ffffffffa04aaceb>] dmu_free_long_range+0x4b/0x70 [zfs]
[189316.426796]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189316.426867]  [<ffffffffa052f55b>] zfs_trunc+0x6b/0x1f0 [zfs]
[189316.426915]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189316.426986]  [<ffffffffa04e08dc>] ? sa_lookup+0x6c/0x80 [zfs]
[189316.427058]  [<ffffffffa053150b>] zfs_freesp+0x34b/0x3c0 [zfs]
[189316.427107]  [<ffffffff813e99eb>] ? _raw_spin_unlock+0x2b/0x40
[189316.427161]  [<ffffffffa03edd7c>] ? tsd_exit+0x19c/0x1d0 [spl]
[189316.427233]  [<ffffffffa052c035>] ? zfs_lookup+0x265/0x340 [zfs]
[189316.427306]  [<ffffffffa0541869>] zpl_xattr_set_dir+0x109/0x220 [zfs]
[189316.427380]  [<ffffffffa05415f4>] ? __zpl_xattr_get+0xb4/0x220 [zfs]
[189316.427454]  [<ffffffffa0541ee1>] zpl_xattr_set+0x1b1/0x2a0 [zfs]
[189316.427527]  [<ffffffffa0542152>] zpl_xattr_user_set+0x92/0xb0 [zfs]
[189316.427580]  [<ffffffff8116c53e>] generic_setxattr+0x9e/0xb0
[189316.427629]  [<ffffffff8116d234>] __vfs_setxattr_noperm+0x64/0xf0
[189316.427678]  [<ffffffff8116d384>] vfs_setxattr+0xc4/0xd0
[189316.427725]  [<ffffffff8116d440>] setxattr+0xb0/0x120
[189316.427771]  [<ffffffff81155bf3>] ? putname+0x33/0x50
[189316.427816]  [<ffffffff8115a273>] ? user_path_at_empty+0x63/0xa0
[189316.427865]  [<ffffffff81168230>] ? vfsmount_lock_local_unlock_cpu+0x50/0x50
[189316.427918]  [<ffffffff81168266>] ? vfsmount_lock_local_unlock+0x36/0x40
[189316.427969]  [<ffffffff811685ed>] ? mntput_no_expire+0x3d/0x130
[189316.428018]  [<ffffffff8116d635>] sys_lsetxattr+0xa5/0xc0
[189316.428066]  [<ffffffff81206c99>] ? lockdep_sys_exit_thunk+0x35/0x67
[189316.428116]  [<ffffffff813f1aa9>] system_call_fastpath+0x16/0x1b
[189316.428165] 2 locks held by rsync/16774:
[189316.428205]  #0:  (&type->i_mutex_dir_key#4){......}, at: [<ffffffff8116d33d>] vfs_setxattr+0x7d/0xd0
[189316.428305]  #1:  (&zp->z_xattr_lock){......}, at: [<ffffffffa0541db2>] zpl_xattr_set+0x82/0x2a0 [zfs]
[189316.428428] INFO: task rmdir:18095 blocked for more than 120 seconds.
[189316.428477] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189316.428555] rmdir           D ffff880c1bfba660     0 18095  15355 0x00000000
[189316.428612]  ffff8803bbca1c18 0000000000000046 ffff8806236021e0 00000000001d38c0
[189316.428699]  ffff8803bbca1fd8 ffff8803bbca0010 00000000001d38c0 00000000001d38c0
[189316.428785]  ffff8803bbca1fd8 00000000001d38c0 ffffffff8160d020 ffff8806236021e0
[189316.428871] Call Trace:
[189316.428908]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189316.428959]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189316.429008]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189316.429060]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189316.429132]  [<ffffffffa04f4983>] txg_wait_open+0xa3/0x130 [zfs]
[189316.429201]  [<ffffffffa04b778d>] dmu_tx_wait+0x11d/0x120 [zfs]
[189316.429274]  [<ffffffffa052e5ac>] zfs_rmdir+0x15c/0x6c0 [zfs]
[189316.429323]  [<ffffffff81161c23>] ? shrink_dcache_parent+0x263/0x300
[189316.429396]  [<ffffffffa0540c1b>] zpl_rmdir+0x4b/0x70 [zfs]
[189316.429444]  [<ffffffff811564a8>] vfs_rmdir+0xa8/0xf0
[189316.429489]  [<ffffffff8115a19b>] do_rmdir+0x12b/0x140
[189316.429536]  [<ffffffff81147d8c>] ? sys_close+0xac/0x120
[189316.429583]  [<ffffffff81206c99>] ? lockdep_sys_exit_thunk+0x35/0x67
[189316.429633]  [<ffffffff8115a206>] sys_rmdir+0x16/0x20
[189316.429679]  [<ffffffff813f1aa9>] system_call_fastpath+0x16/0x1b
[189316.429727] 2 locks held by rmdir/18095:
[189316.429767]  #0:  (&type->i_mutex_dir_key#4/1){......}, at: [<ffffffff8115a0fc>] do_rmdir+0x8c/0x140
[189316.429869]  #1:  (&type->i_mutex_dir_key#4){......}, at: [<ffffffff81156463>] vfs_rmdir+0x63/0xf0

[189436.241968] INFO: task txg_sync:13636 blocked for more than 120 seconds.
[189436.242027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189436.242110] txg_sync        D ffff8806d0061e50     0 13636      2 0x00000000
[189436.242169]  ffff8805fff21b60 0000000000000046 ffff880c231721e0 00000000001d38c0
[189436.242256]  ffff8805fff21fd8 ffff8805fff20010 00000000001d38c0 00000000001d38c0
[189436.242342]  ffff8805fff21fd8 00000000001d38c0 ffff880601ba21e0 ffff880c231721e0
[189436.242429] Call Trace:
[189436.242473]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189436.242531]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189436.242582]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189436.242634]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189436.242722]  [<ffffffffa053bd3b>] zio_wait+0x10b/0x190 [zfs]
[189436.242792]  [<ffffffffa04cf99a>] dsl_pool_sync+0xca/0x4a0 [zfs]
[189436.242863]  [<ffffffffa04e2e77>] spa_sync+0x397/0x9d0 [zfs]
[189436.242912]  [<ffffffff81097e65>] ? lock_release_holdtime+0x35/0x170
[189436.242987]  [<ffffffffa04f5371>] txg_sync_thread+0x2f1/0x4d0 [zfs]
[189436.243062]  [<ffffffffa04f5080>] ? txg_do_callbacks+0x60/0x60 [zfs]
[189436.243117]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189436.243172]  [<ffffffffa03e4508>] thread_generic_wrapper+0x78/0x90 [spl]
[189436.243227]  [<ffffffffa03e4490>] ? __thread_create+0x360/0x360 [spl]
[189436.243278]  [<ffffffff8106c9be>] kthread+0xae/0xc0
[189436.243325]  [<ffffffff813f2da4>] kernel_thread_helper+0x4/0x10
[189436.243373]  [<ffffffff8106c910>] ? __init_kthread_worker+0x70/0x70
[189436.243423]  [<ffffffff813f2da0>] ? gs_change+0x13/0x13
[189436.243469] no locks held by txg_sync/13636.
[189436.243518] INFO: task rsync:16774 blocked for more than 120 seconds.
[189436.243567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[189436.243645] rsync           D ffff880c1bfba660     0 16774  16771 0x00000000
[189436.243702]  ffff880489695778 0000000000000046 ffff88062356a1e0 00000000001d38c0
[189436.243789]  ffff880489695fd8 ffff880489694010 00000000001d38c0 00000000001d38c0
[189436.243875]  ffff880489695fd8 00000000001d38c0 ffff8806235d43c0 ffff88062356a1e0
[189436.243962] Call Trace:
[189436.243999]  [<ffffffff813e813f>] schedule+0x3f/0x60
[189436.244049]  [<ffffffffa03ec44d>] cv_wait_common+0xad/0x1d0 [spl]
[189436.244099]  [<ffffffff8106cef0>] ? wake_up_bit+0x40/0x40
[189436.244150]  [<ffffffffa03ec5a3>] __cv_wait+0x13/0x20 [spl]
[189436.244222]  [<ffffffffa04f4983>] txg_wait_open+0xa3/0x130 [zfs]
[189436.244288]  [<ffffffffa04b778d>] dmu_tx_wait+0x11d/0x120 [zfs]
[189436.244356]  [<ffffffffa04b786e>] dmu_tx_assign+0x8e/0x4a0 [zfs]
[189436.244423]  [<ffffffffa04aabe9>] dmu_free_long_range_impl+0x1b9/0x270 [zfs]
[189436.244493]  [<ffffffffa04aaceb>] dmu_free_long_range+0x4b/0x70 [zfs]
[189436.244544]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189436.244615]  [<ffffffffa052f55b>] zfs_trunc+0x6b/0x1f0 [zfs]
[189436.244663]  [<ffffffff813e5ece>] ? mutex_unlock+0xe/0x10
[189436.244733]  [<ffffffffa04e08dc>] ? sa_lookup+0x6c/0x80 [zfs]
[189436.244806]  [<ffffffffa053150b>] zfs_freesp+0x34b/0x3c0 [zfs]
[189436.244855]  [<ffffffff813e99eb>] ? _raw_spin_unlock+0x2b/0x40
[189436.244909]  [<ffffffffa03edd7c>] ? tsd_exit+0x19c/0x1d0 [spl]
[189436.244981]  [<ffffffffa052c035>] ? zfs_lookup+0x265/0x340 [zfs]
[189436.245054]  [<ffffffffa0541869>] zpl_xattr_set_dir+0x109/0x220 [zfs]
[189436.245128]  [<ffffffffa05415f4>] ? __zpl_xattr_get+0xb4/0x220 [zfs]
[189436.245202]  [<ffffffffa0541ee1>] zpl_xattr_set+0x1b1/0x2a0 [zfs]
[189436.245275]  [<ffffffffa0542152>] zpl_xattr_user_set+0x92/0xb0 [zfs]
[189436.245328]  [<ffffffff8116c53e>] generic_setxattr+0x9e/0xb0
[189436.245376]  [<ffffffff8116d234>] __vfs_setxattr_noperm+0x64/0xf0
[189436.245425]  [<ffffffff8116d384>] vfs_setxattr+0xc4/0xd0
[189436.245472]  [<ffffffff8116d440>] setxattr+0xb0/0x120
[189436.245518]  [<ffffffff81155bf3>] ? putname+0x33/0x50
[189436.245564]  [<ffffffff8115a273>] ? user_path_at_empty+0x63/0xa0
[189436.245613]  [<ffffffff81168230>] ? vfsmount_lock_local_unlock_cpu+0x50/0x50
[189436.245666]  [<ffffffff81168266>] ? vfsmount_lock_local_unlock+0x36/0x40
[189436.245717]  [<ffffffff811685ed>] ? mntput_no_expire+0x3d/0x130
[189436.245765]  [<ffffffff8116d635>] sys_lsetxattr+0xa5/0xc0
[189436.245813]  [<ffffffff81206c99>] ? lockdep_sys_exit_thunk+0x35/0x67
[189436.245864]  [<ffffffff813f1aa9>] system_call_fastpath+0x16/0x1b
[189436.245912] 2 locks held by rsync/16774:
[189436.245952]  #0:  (&type->i_mutex_dir_key#4){......}, at: [<ffffffff8116d33d>] vfs_setxattr+0x7d/0xd0
[189436.246052]  #1:  (&zp->z_xattr_lock){......}, at: [<ffffffffa0541db2>] zpl_xattr_set+0x82/0x2a0 [zfs]

ryanolf · 2012-04-02T21:45:42Z

I'm no longer seeing this every night in 0.6.0.56 ppa release, but I also am no longer pushing the zfs system as hard (i.e. no longer snapshotting while copying large amounts of data). Also, I had one event where I got a bunch of these messages and the system did not lock up, so I cannot be certain these messages lead to a crash, or if something else caused my system to crash.

ryanolf · 2012-04-19T20:44:35Z

I just found out that my RAM had some bad areas, which was likely leading to corruption (especially when the RAM was being used heavily... as during large zfs operations). So far so good after fixing the RAM and doing a clean install. The crashes I was experiencing were likely not the (sole) fault of zfs.

behlendorf · 2012-04-19T23:23:47Z

Thanks for the update, I'm glad to hear you found at least one of the causes.

sneak · 2012-11-28T22:25:37Z

I'm having this problem on a freshly installed ubuntu 12.x (precise) on a system with ECC ram. The disks came out of the same box when it was previously running FreeNAS. I tried the stable branch, now trying the dailies to see if that resolves the issue.

It may be worth noting that reads were fine, it was only when I tried to write to the filesystem did I encounter the problem.

marcinkuk · 2013-10-18T17:04:10Z

I have strange issue similar to above.

System:
Ubuntu 13.04 with 6GB RAM
[ 10.288418] ZFS: Loaded module v0.6.2-1~raring, ZFS pool version 5000, ZFS filesystem version 5
Linux brian 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I have mirrored zpool (2x2TB)

I tried to destroy dataset with 2 snapshots by using command:

zfs destroy -r pool1/dataset

Load on server achieved values above 50 and takes a lot of time. Console was freezed more of time.
After 1h I rebooted machine. During loading ZFS module hard disks did something (hdd leds blinking all the time) and system didn't finish booting. I powered off system and rebooted without disks. After booting finished I pluged SATA cables and disks was online.
Command "# zpools status" freezed console and disks doing something all the time.

dmesg
[...]
[ 600.400720] txg_sync D ffff8801adc53ec0 0 3074 2 0x00000000
[ 600.400728] ffff880183aa9b88 0000000000000046 ffff88019004c5c0 ffff880183aa9fd8
[ 600.400735] ffff880183aa9fd8 ffff880183aa9fd8 ffff8801a4931740 ffff88019004c5c0
[ 600.400742] ffff88019004c5c0 ffff8801adc54778 ffff8801867b9dc0 0000000000000001
[ 600.400748] Call Trace:
[ 600.400758] [] schedule+0x29/0x70
[ 600.400766] [] io_schedule+0x8f/0xd0
[ 600.400789] [] cv_wait_common+0xa8/0x1b0 [spl]
[ 600.400795] [] ? finish_wait+0x80/0x80
[ 600.400816] [] __cv_wait_io+0x18/0x20 [spl]
[ 600.400887] [] zio_wait+0x103/0x1c0 [zfs]
[ 600.400954] [] dsl_scan_sync+0x486/0xa90 [zfs]
[ 600.401027] [] spa_sync+0x4a5/0xaf0 [zfs]
[ 600.401036] [] ? try_to_wake_up+0x1e2/0x2a0
[ 600.401044] [] ? ktime_get_ts+0x48/0xe0
[ 600.401118] [] txg_sync_thread+0x323/0x590 [zfs]
[ 600.401193] [] ? txg_init+0x250/0x250 [zfs]
[ 600.401213] [] thread_generic_wrapper+0x78/0x90 [spl]
[ 600.401232] [] ? __thread_create+0x340/0x340 [spl]
[ 600.401239] [] kthread+0xc0/0xd0
[ 600.401246] [] ? kthread_create_on_node+0x120/0x120
[ 600.401253] [] ret_from_fork+0x7c/0xb0
[ 600.401259] [] ? kthread_create_on_node+0x120/0x120
[...]
[ 600.401377] Call Trace:
[ 600.401385] [] schedule+0x29/0x70
[ 600.401394] [] schedule_preempt_disabled+0xe/0x10
[ 600.401402] [] __mutex_lock_slowpath+0xd7/0x150
[ 600.401411] [] mutex_lock+0x2a/0x50
[ 600.401419] [] __blkdev_get+0x6b/0x4b0
[ 600.401426] [] blkdev_get+0x1ad/0x300
[ 600.401433] [] ? bdget+0x3e/0x150
[ 600.401441] [] ? _raw_spin_lock+0xe/0x20
[ 600.401448] [] blkdev_open+0x5b/0x80
[ 600.401455] [] do_dentry_open+0x203/0x290
[ 600.401462] [] ? blkdev_get+0x300/0x300
[ 600.401468] [] vfs_open+0x49/0x50
[ 600.401475] [] do_last+0x296/0xea0
[ 600.401482] [] ? inode_permission+0x18/0x50
[ 600.401488] [] ? link_path_walk+0x6b/0x880
[ 600.401496] [] path_openat+0xb3/0x4b0
[ 600.401503] [] ? bdput+0x12/0x20
[ 600.401510] [] ? __blkdev_put+0x9f/0x1d0
[ 600.401516] [] ? iput+0x48/0x190
[ 600.401523] [] ? __blkdev_put+0x151/0x1d0
[ 600.401530] [] do_filp_open+0x41/0xa0
[ 600.401537] [] ? __alloc_fd+0xd9/0x130
[ 600.401544] [] do_sys_open+0xf3/0x230
[ 600.401551] [] sys_open+0x21/0x30
[ 600.401557] [] system_call_fastpath+0x1a/0x1f
[...]

top
3074 root 0 -20 0 0 0 D 2.9 0.0 0:37.34 txg_sync
2730 root 39 19 0 0 0 S 0.6 0.0 0:25.97 z_rd_int/0
2731 root 39 19 0 0 0 S 0.6 0.0 0:23.57 z_rd_int/1
2720 root 39 19 0 0 0 S 0.2 0.0 0:02.05 z_null_iss/0
2723 root 39 19 0 0 0 S 0.2 0.0 0:01.24 z_rd_iss/1
2724 root 39 19 0 0 0 S 0.2 0.0 0:01.12 z_rd_iss/2
2726 root 39 19 0 0 0 S 0.2 0.0 0:01.26 z_rd_iss/4
2728 root 39 19 0 0 0 S 0.2 0.0 0:01.24 z_rd_iss/6
2729 root 39 19 0 0 0 S 0.2 0.0 0:01.30 z_rd_iss/7

uptime
18:58:35 up 51 min, 4 users, load average: 11.46, 19.05, 18.73

iostat
Linux 3.8.0-31-generic (brian) 10/18/13 x86_64 (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.19 0.00 3.61 71.12 0.00 25.09

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 2.31 68.46 2.12 215729 6668
dm-0 3.59 65.08 2.11 205077 6664
dm-1 0.07 0.28 0.00 896 0
dm-2 0.11 0.43 0.00 1341 4
dm-3 0.07 0.27 0.00 836 0
dm-4 0.07 0.27 0.00 836 0
dm-5 0.07 0.27 0.00 836 0
dm-6 0.07 0.27 0.00 836 0
dm-7 0.07 0.27 0.00 836 0
dm-8 0.07 0.27 0.00 836 0
dm-9 0.07 0.27 0.00 836 0
dm-10 0.07 0.27 0.00 836 0
dm-11 0.07 0.27 0.00 836 0
sdb 152.64 354.86 1174.00 1118214 3699490
zd0 0.00 0.00 0.00 12 0
zd16 0.00 0.00 0.00 12 0
zd32 0.00 0.00 0.00 12 0

Here u can see only one disk because I removed one for safety. For two disks behaviour is the same.
Any ideas what can I do in this case?

ryao · 2013-10-18T19:32:33Z

@marcinkuk Ths sync thread being stuck is a symptom of a deadlock somewhere else and that should merit a new issue.

behlendorf · 2013-10-22T18:53:26Z

@marcinkuk If you're still able to recreate this issue please open a new issue for it. It will get lost if it's just appended to the end of this closed bug.

…aster Merge remote-tracking branch '6.0/stage' into 'master'

behlendorf closed this as completed Mar 16, 2012

sneak mentioned this issue Nov 28, 2012

Xen with dom0_mem set leads to zfs txg_sync high cpu and complete zfs io hang #1067

Closed

theking2 mentioned this issue Jan 20, 2015

zxg_sync blocked #3026

Closed

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023

Merge pull request openzfs#542 from delphix/projects/merge-upstream/m…

938e1af

…aster Merge remote-tracking branch '6.0/stage' into 'master'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS hang on tgx_sync #542

ZFS hang on tgx_sync #542

lkishalmi commented Jan 26, 2012

lkishalmi commented Jan 26, 2012

behlendorf commented Jan 26, 2012

RLovelett commented Jan 28, 2012

lkishalmi commented Jan 29, 2012

RLovelett commented Jan 29, 2012

lkishalmi commented Jan 29, 2012

RLovelett commented Jan 29, 2012

RLovelett commented Jan 30, 2012

behlendorf commented Mar 16, 2012

lkishalmi commented Mar 16, 2012

behlendorf commented Mar 16, 2012

ryanolf commented Mar 26, 2012

stephane-chazelas commented Mar 27, 2012

stephane-chazelas commented Mar 28, 2012

chrisrd commented Apr 2, 2012

ryanolf commented Apr 2, 2012

ryanolf commented Apr 19, 2012

behlendorf commented Apr 19, 2012

sneak commented Nov 28, 2012

marcinkuk commented Oct 18, 2013

ryao commented Oct 18, 2013

behlendorf commented Oct 22, 2013

ZFS hang on tgx_sync #542

ZFS hang on tgx_sync #542

Comments

lkishalmi commented Jan 26, 2012

lkishalmi commented Jan 26, 2012

behlendorf commented Jan 26, 2012

RLovelett commented Jan 28, 2012

lkishalmi commented Jan 29, 2012

RLovelett commented Jan 29, 2012

lkishalmi commented Jan 29, 2012

RLovelett commented Jan 29, 2012

RLovelett commented Jan 30, 2012

behlendorf commented Mar 16, 2012

lkishalmi commented Mar 16, 2012

behlendorf commented Mar 16, 2012

ryanolf commented Mar 26, 2012

stephane-chazelas commented Mar 27, 2012

stephane-chazelas commented Mar 28, 2012

chrisrd commented Apr 2, 2012

ryanolf commented Apr 2, 2012

ryanolf commented Apr 19, 2012

behlendorf commented Apr 19, 2012

sneak commented Nov 28, 2012

marcinkuk commented Oct 18, 2013

zfs destroy -r pool1/dataset

ryao commented Oct 18, 2013

behlendorf commented Oct 22, 2013