Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock from racing between zpool export and zvol create #5195

Closed
tuxoko opened this issue Sep 29, 2016 · 6 comments
Closed

Deadlock from racing between zpool export and zvol create #5195

tuxoko opened this issue Sep 29, 2016 · 6 comments
Labels
Component: ZVOL ZFS Volumes Type: Regression Indicates a functional regression
Milestone

Comments

@tuxoko
Copy link
Contributor

tuxoko commented Sep 29, 2016

Discovered from buildbot
http://build.zfsonlinux.org/builders/Amazon%202015.09%20x86_64%20%28TEST%29/builds/2839/steps/shell_12/logs/console

This can be reproduced easily with running these two concurrently
while :; do zpool import pp; zpool export pp; done
while :; do zfs create -sV 8G pp/vol0; zfs destroy pp/vol0; done

[12720.468998] INFO: task z_zvol:31391 blocked for more than 120 seconds.
[12720.469328]       Tainted: P           OE   4.4.0-38-generic #57-Ubuntu
[12720.469639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12720.470006] z_zvol          D ffff88006b3b3c68     0 31391      2 0x00000000
[12720.470351]  ffff88006b3b3c68 ffffffffc055adf0 ffff88013a408dc0 ffff88006b38e040
[12720.470842]  ffff88006b3b4000 ffffffffc096ac44 ffff88006b38e040 00000000ffffffff
[12720.471188]  ffffffffc096ac48 ffff88006b3b3c80 ffffffff8182c5f5 ffffffffc096ac40
[12720.471534] Call Trace:
[12720.471652]  [<ffffffffc055adf0>] ? tsd_set+0x1a0/0x4e0 [spl]
[12720.471906]  [<ffffffff8182c5f5>] schedule+0x35/0x80
[12720.472125]  [<ffffffff8182c89e>] schedule_preempt_disabled+0xe/0x10
[12720.472401]  [<ffffffff8182e4d9>] __mutex_lock_slowpath+0xb9/0x130
[12720.472682]  [<ffffffff8182e56f>] mutex_lock+0x1f/0x30
[12720.472969]  [<ffffffffc07eb530>] spa_open_common+0x60/0x450 [zfs]
[12720.473259]  [<ffffffff810ba52c>] ? set_next_entity+0x9c/0xb0
[12720.473520]  [<ffffffff811ee011>] ? __kmalloc_node+0x231/0x300
[12720.473811]  [<ffffffffc084dee0>] ? zvol_release+0x90/0x90 [zfs]
[12720.474090]  [<ffffffffc07eb943>] spa_open+0x13/0x20 [zfs]
[12720.474342]  [<ffffffffc07ad6ef>] dmu_objset_find+0x3f/0x90 [zfs]
[12720.474625]  [<ffffffffc084dc44>] zvol_task_cb+0x354/0x370 [zfs]
[12720.474890]  [<ffffffffc05560d8>] taskq_thread+0x238/0x420 [spl]
[12720.475151]  [<ffffffff810ac190>] ? wake_up_q+0x70/0x70
[12720.475380]  [<ffffffffc0555ea0>] ? taskq_cancel_id+0x140/0x140 [spl]
[12720.475659]  [<ffffffff810a08d8>] kthread+0xd8/0xf0
[12720.475872]  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0
[12720.476155]  [<ffffffff81830a8f>] ret_from_fork+0x3f/0x70
[12720.476390]  [<ffffffff810a0800>] ? kthread_create_on_node+0x1e0/0x1e0
[12720.476673] INFO: task zpool:31503 blocked for more than 120 seconds.
[12720.476968]       Tainted: P           OE   4.4.0-38-generic #57-Ubuntu
[12720.477260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12720.477601] zpool           D ffff88006af3fcb8     0 31503   5834 0x00000000
[12720.477922]  ffff88006af3fcb8 7fffffffffffffff ffff88013a409b80 ffff88006b660dc0
[12720.478268]  ffff88006af40000 0000000000000008 0000000000000007 ffff88013a0f24c8
[12720.478715]  0000000000000000 ffff88006af3fcd0 ffffffff8182c5f5 ffff88013a0f2400
[12720.479062] Call Trace:
[12720.479174]  [<ffffffff8182c5f5>] schedule+0x35/0x80
[12720.479395]  [<ffffffffc0556374>] taskq_wait+0x74/0xe0 [spl]
[12720.479646]  [<ffffffff810c3d80>] ? wake_atomic_t_function+0x60/0x60
[12720.479926]  [<ffffffffc0556427>] taskq_destroy+0x47/0x120 [spl]
[12720.480204]  [<ffffffffc07e56bb>] spa_deactivate+0x2b/0x2b0 [zfs]
[12720.480486]  [<ffffffffc07e8349>] spa_export_common.part.17+0x219/0x3c0 [zfs]
[12720.480811]  [<ffffffffc07e8556>] spa_export+0x36/0x40 [zfs]
[12720.481118]  [<ffffffffc0820e34>] zfs_ioc_pool_export+0x34/0x40 [zfs]
[12720.481423]  [<ffffffffc0820c6b>] zfsdev_ioctl+0x44b/0x4e0 [zfs]
[12720.481687]  [<ffffffff812211ff>] do_vfs_ioctl+0x29f/0x490
[12720.481929]  [<ffffffff8121cf34>] ? putname+0x54/0x60
[12720.482152]  [<ffffffff8120c88f>] ? do_sys_open+0x1bf/0x2a0
[12720.482396]  [<ffffffff81221469>] SyS_ioctl+0x79/0x90
[12720.482618]  [<ffffffff818306f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[12720.482899] INFO: task zfs:31517 blocked for more than 120 seconds.
[12720.483172]       Tainted: P           OE   4.4.0-38-generic #57-Ubuntu
[12720.483460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12720.483799] zfs             D ffff88006d8bbd08     0 31517   5858 0x00000000
[12720.484119]  ffff88006d8bbd08 ffff88011c29c000 ffff88006b660dc0 ffff880138de5280
[12720.484465]  ffff88006d8bc000 ffffffffc096ac44 ffff880138de5280 00000000ffffffff
[12720.484810]  ffffffffc096ac48 ffff88006d8bbd20 ffffffff8182c5f5 ffffffffc096ac40
[12720.485204] Call Trace:
[12720.485312]  [<ffffffff8182c5f5>] schedule+0x35/0x80
[12720.485525]  [<ffffffff8182c89e>] schedule_preempt_disabled+0xe/0x10
[12720.485796]  [<ffffffff8182e4d9>] __mutex_lock_slowpath+0xb9/0x130
[12720.486059]  [<ffffffff8182e56f>] mutex_lock+0x1f/0x30
[12720.486293]  [<ffffffffc07eb530>] spa_open_common+0x60/0x450 [zfs]
[12720.486687]  [<ffffffffc07eb943>] spa_open+0x13/0x20 [zfs]
[12720.486944]  [<ffffffffc0820a9b>] zfsdev_ioctl+0x27b/0x4e0 [zfs]
[12720.487203]  [<ffffffff812211ff>] do_vfs_ioctl+0x29f/0x490
[12720.487443]  [<ffffffff8106b544>] ? __do_page_fault+0x1b4/0x400
[12720.487696]  [<ffffffff81221469>] SyS_ioctl+0x79/0x90
[12720.487913]  [<ffffffff818306f2>] entry_SYSCALL_64_fastpath+0x16/0x71
@bprotopopov
Copy link
Contributor

It looks to me that the two threads trying to lock the namespace above are blocked on the third thread that has locked the namespace and waiting for the taskq to complete outstanding tasks.
The reason for the taskq not completing is to be determined. In other words, there does not seem to be a deadlock across the stack traces shown in the bug report.

@bprotopopov
Copy link
Contributor

bprotopopov commented Jun 12, 2017

@tuxoko
I have tried the commands you mentioned, on my system, but I did not see any tasks blocked, and all zpool commands returned without hanging. Could you perhaps add the backtrace of the taskq thread to show what it is waiting for ?

@tuxoko
Copy link
Contributor Author

tuxoko commented Jun 23, 2017

@bprotopopov
zpool export in second trace is holding spa_namespace_lock while doing taskq_destroy which waits on z_zvol in the first trace who's trying to acquire spa_namespace_lock. The third zfs trace is irrelevant to the deadlock.

I'm not exactly sure if this race is still possible, I just tried to reproduce it with the above command but haven't succeeded yet.

@bprotopopov
Copy link
Contributor

bprotopopov commented Jun 24, 2017

Hi, @tuxoko, thanks for pointing this out.

This might not be easy to reproduce; the issue has been discussed (see the discussion about spa_deactivate() in #3830) and was supposed to be addressed by doing taskq_wait() earlier in the spa_export_common(), where the lock was dropped to stop asynchronous pool tasks. I suppose it is possible that a zvol is created after the taskq is drained, but before the namespace lock is re-acquired, resulting in a taskq that is not empty at the time of deactivation, so it seems like there is still a race possible.

The destruction of the taskq was deferred to spa_deactivate() because taskq creation is handled by spa_activate(). I guess I'll have to think more about ways to close the race window you pointed out.

@bprotopopov
Copy link
Contributor

I have looked into this a bit and tried to address this by adding suspend/resume functionality to the taskq (wrapped taskq_t with staskq_t - suspendable taskq) - effectively preventing further dispatches to the queue after taskq_wait() in spa_export_common(). This prevented the deadlock in question yet it exposed more deadlocks down the line:

the taskq thread is adding a disk

PID: 20938  TASK: ffff8800370e0040  CPU: 3   COMMAND: "z_zvol"
 #0 [ffff880033157ab0] schedule at ffffffff815489d0
 #1 [ffff880033157b88] __mutex_lock_slowpath at ffffffff8154a3c6
 #2 [ffff880033157bf8] mutex_lock at ffffffff81549eeb
 #3 [ffff880033157c18] __blkdev_get at ffffffff811d7bb3
 #4 [ffff880033157c78] blkdev_get at ffffffff811d7f30
 #5 [ffff880033157c88] register_disk at ffffffff81212b15
 #6 [ffff880033157cd8] add_disk at ffffffff81287f26
 #7 [ffff880033157d08] zvol_create_minor_impl at ffffffffa0a2636e [zfs]
 #8 [ffff880033157d78] zvol_task_cb at ffffffffa0a29491 [zfs]
 #9 [ffff880033157e28] taskq_thread at ffffffffa083e934 [spl]
#10 [ffff880033157ee8] kthread at ffffffff810a640e
#11 [ffff880033157f48] kernel_thread at ffffffff8100c28a

while blkid is opening the device (presumably, reacting to an earlier add_disk())

PID: 21800  TASK: ffff88003708cab0  CPU: 1   COMMAND: "blkid"
 #0 [ffff880032ee3ac8] schedule at ffffffff815489d0
 #1 [ffff880032ee3ba0] __mutex_lock_slowpath at ffffffff8154a3c6
 #2 [ffff880032ee3c10] mutex_lock at ffffffff81549eeb
 #3 [ffff880032ee3c30] __blkdev_get at ffffffff811d7bb3
 #4 [ffff880032ee3c90] __blkdev_get at ffffffff811d7e42
 #5 [ffff880032ee3cf0] blkdev_get at ffffffff811d7f30
 #6 [ffff880032ee3d00] blkdev_open at ffffffff811d7fb1
 #7 [ffff880032ee3d30] __dentry_open at ffffffff81196c52
 #8 [ffff880032ee3d90] nameidata_to_filp at ffffffff81196fc4
 #9 [ffff880032ee3db0] do_filp_open at ffffffff811aceb0
#10 [ffff880032ee3f20] do_sys_open at ffffffff811969f7
#11 [ffff880032ee3f70] sys_open at ffffffff81196b00
#12 [ffff880032ee3f80] system_call_fastpath at ffffffff8100b0d2

and zpool is waiting for the taskq thread to finish the work

PID: 21394  TASK: ffff880079417520  CPU: 3   COMMAND: "zpool"
 #0 [ffff880033b5bbf8] schedule at ffffffff815489d0
 #1 [ffff880033b5bcd0] taskq_wait at ffffffffa083d93d [spl]
 #2 [ffff880033b5bd30] staskq_wait at ffffffffa09a6712 [zfs]
 #3 [ffff880033b5bd40] spa_export_common at ffffffffa09956eb [zfs]
 #4 [ffff880033b5bda0] spa_export at ffffffffa0995b0b [zfs]
 #5 [ffff880033b5bdb0] zfs_ioc_pool_export at ffffffffa09e2c5f [zfs]
 #6 [ffff880033b5bde0] zfsdev_ioctl at ffffffffa09e54c4 [zfs]
 #7 [ffff880033b5be60] vfs_ioctl at ffffffff811af562
 #8 [ffff880033b5bea0] do_vfs_ioctl at ffffffff811af704
 #9 [ffff880033b5bf30] sys_ioctl at ffffffff811afc81
#10 [ffff880033b5bf80] system_call_fastpath at ffffffff8100b0d2

It is as if the bdev->bd_mutex was acquired and not released, and the result is a hang waiting for the mutex.

@behlendorf
Copy link
Contributor

Closing, I believe this issue was resolved @bprotopopov's zvol refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

4 participants