Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple kernel errors on 3.16.0-24 #2859

Closed
krichter722 opened this issue Oct 31, 2014 · 5 comments
Closed

multiple kernel errors on 3.16.0-24 #2859

krichter722 opened this issue Oct 31, 2014 · 5 comments

Comments

@krichter722
Copy link

I'm getting the following kernel errors logged in syslog when a pool is mounted with vdevs on a cifs mount (but I'm not sure whether they're related to that, other pools on internal and external HDD are mounted as well)

Nov  1 00:06:10 localhost kernel: [ 4805.201697] INFO: task txg_sync:3886 blocked for more than 120 seconds.
Nov  1 00:06:10 localhost kernel: [ 4805.201706]       Tainted: P        W  OE 3.16.0-24-generic #32-Ubuntu
Nov  1 00:06:10 localhost kernel: [ 4805.201708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  1 00:06:10 localhost kernel: [ 4805.201711] txg_sync        D ffff88043f294800     0  3886      2 0x00000000
Nov  1 00:06:10 localhost kernel: [ 4805.201717]  ffff880271873be0 0000000000000046 ffff880059449460 0000000000014800
Nov  1 00:06:10 localhost kernel: [ 4805.201722]  ffff880271873fd8 0000000000014800 ffff880059449460 ffff88043f295100
Nov  1 00:06:10 localhost kernel: [ 4805.201726]  ffffc90165c0d930 ffffc90165c0d960 0000000000000001 0000000000000002
Nov  1 00:06:10 localhost kernel: [ 4805.201731] Call Trace:
Nov  1 00:06:10 localhost kernel: [ 4805.201742]  [<ffffffff827831cf>] io_schedule+0xaf/0x150
Nov  1 00:06:10 localhost kernel: [ 4805.201782]  [<ffffffffc063eb2d>] cv_wait_common+0x9d/0x1a0 [spl]
Nov  1 00:06:10 localhost kernel: [ 4805.201790]  [<ffffffff820b9590>] ? prepare_to_wait_event+0x100/0x100
Nov  1 00:06:10 localhost kernel: [ 4805.201807]  [<ffffffffc063ec88>] __cv_wait_io+0x18/0x20 [spl]
Nov  1 00:06:10 localhost kernel: [ 4805.201874]  [<ffffffffc077c5a3>] zio_wait+0x113/0x1d0 [zfs]
Nov  1 00:06:10 localhost kernel: [ 4805.201916]  [<ffffffffc070d1c1>] dsl_pool_sync+0xb1/0x450 [zfs]
Nov  1 00:06:10 localhost kernel: [ 4805.201962]  [<ffffffffc0724fcd>] spa_sync+0x41d/0xb00 [zfs]
Nov  1 00:06:10 localhost kernel: [ 4805.201970]  [<ffffffff820de4f8>] ? ktime_get_ts+0x48/0xf0
Nov  1 00:06:10 localhost kernel: [ 4805.202020]  [<ffffffffc07352a2>] txg_sync_thread+0x382/0x5f0 [zfs]
Nov  1 00:06:10 localhost kernel: [ 4805.202070]  [<ffffffffc0734f20>] ? txg_delay+0xf0/0xf0 [zfs]
Nov  1 00:06:10 localhost kernel: [ 4805.202084]  [<ffffffffc0636f4a>] thread_generic_wrapper+0x7a/0x90 [spl]
Nov  1 00:06:10 localhost kernel: [ 4805.202098]  [<ffffffffc0636ed0>] ? __thread_exit+0xa0/0xa0 [spl]
Nov  1 00:06:10 localhost kernel: [ 4805.202104]  [<ffffffff82094aeb>] kthread+0xdb/0x100
Nov  1 00:06:10 localhost kernel: [ 4805.202110]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  1 00:06:10 localhost kernel: [ 4805.202116]  [<ffffffff82787c3c>] ret_from_fork+0x7c/0xb0
Nov  1 00:06:10 localhost kernel: [ 4805.202121]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  1 00:08:10 localhost kernel: [ 4925.328879] INFO: task txg_sync:3886 blocked for more than 120 seconds.
Nov  1 00:08:10 localhost kernel: [ 4925.328886]       Tainted: P        W  OE 3.16.0-24-generic #32-Ubuntu
Nov  1 00:08:10 localhost kernel: [ 4925.328888] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  1 00:08:10 localhost kernel: [ 4925.328891] txg_sync        D ffff88043f294800     0  3886      2 0x00000000
Nov  1 00:08:10 localhost kernel: [ 4925.328897]  ffff880271873be0 0000000000000046 ffff880059449460 0000000000014800
Nov  1 00:08:10 localhost kernel: [ 4925.328901]  ffff880271873fd8 0000000000014800 ffff880059449460 ffff88043f295100
Nov  1 00:08:10 localhost kernel: [ 4925.328904]  ffffc90165c0d930 ffffc90165c0d960 0000000000000001 0000000000000002
Nov  1 00:08:10 localhost kernel: [ 4925.328908] Call Trace:
Nov  1 00:08:10 localhost kernel: [ 4925.328918]  [<ffffffff827831cf>] io_schedule+0xaf/0x150
Nov  1 00:08:10 localhost kernel: [ 4925.328952]  [<ffffffffc063eb2d>] cv_wait_common+0x9d/0x1a0 [spl]
Nov  1 00:08:10 localhost kernel: [ 4925.328959]  [<ffffffff820b9590>] ? prepare_to_wait_event+0x100/0x100
Nov  1 00:08:10 localhost kernel: [ 4925.328972]  [<ffffffffc063ec88>] __cv_wait_io+0x18/0x20 [spl]
Nov  1 00:08:10 localhost kernel: [ 4925.329038]  [<ffffffffc077c5a3>] zio_wait+0x113/0x1d0 [zfs]
Nov  1 00:08:10 localhost kernel: [ 4925.329088]  [<ffffffffc070d1c1>] dsl_pool_sync+0xb1/0x450 [zfs]
Nov  1 00:08:10 localhost kernel: [ 4925.329141]  [<ffffffffc0724fcd>] spa_sync+0x41d/0xb00 [zfs]
Nov  1 00:08:10 localhost kernel: [ 4925.329153]  [<ffffffff820de4f8>] ? ktime_get_ts+0x48/0xf0
Nov  1 00:08:10 localhost kernel: [ 4925.329205]  [<ffffffffc07352a2>] txg_sync_thread+0x382/0x5f0 [zfs]
Nov  1 00:08:10 localhost kernel: [ 4925.329259]  [<ffffffffc0734f20>] ? txg_delay+0xf0/0xf0 [zfs]
Nov  1 00:08:10 localhost kernel: [ 4925.329278]  [<ffffffffc0636f4a>] thread_generic_wrapper+0x7a/0x90 [spl]
Nov  1 00:08:10 localhost kernel: [ 4925.329294]  [<ffffffffc0636ed0>] ? __thread_exit+0xa0/0xa0 [spl]
Nov  1 00:08:10 localhost kernel: [ 4925.329302]  [<ffffffff82094aeb>] kthread+0xdb/0x100
Nov  1 00:08:10 localhost kernel: [ 4925.329308]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  1 00:08:10 localhost kernel: [ 4925.329314]  [<ffffffff82787c3c>] ret_from_fork+0x7c/0xb0
Nov  1 00:08:10 localhost kernel: [ 4925.329318]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  1 00:09:01 localhost CRON[14267]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Nov  1 00:10:10 localhost kernel: [ 5045.455265] INFO: task txg_sync:3886 blocked for more than 120 seconds.
Nov  1 00:10:10 localhost kernel: [ 5045.455274]       Tainted: P        W  OE 3.16.0-24-generic #32-Ubuntu
Nov  1 00:10:10 localhost kernel: [ 5045.455276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov  1 00:10:10 localhost kernel: [ 5045.455279] txg_sync        D ffff88043f294800     0  3886      2 0x00000000
Nov  1 00:10:10 localhost kernel: [ 5045.455286]  ffff880271873be0 0000000000000046 ffff880059449460 0000000000014800
Nov  1 00:10:10 localhost kernel: [ 5045.455291]  ffff880271873fd8 0000000000014800 ffff880059449460 ffff88043f295100
Nov  1 00:10:10 localhost kernel: [ 5045.455295]  ffffc90165c0d930 ffffc90165c0d960 0000000000000001 0000000000000002
Nov  1 00:10:10 localhost kernel: [ 5045.455299] Call Trace:
Nov  1 00:10:10 localhost kernel: [ 5045.455310]  [<ffffffff827831cf>] io_schedule+0xaf/0x150
Nov  1 00:10:10 localhost kernel: [ 5045.455346]  [<ffffffffc063eb2d>] cv_wait_common+0x9d/0x1a0 [spl]
Nov  1 00:10:10 localhost kernel: [ 5045.455353]  [<ffffffff820b9590>] ? prepare_to_wait_event+0x100/0x100
Nov  1 00:10:10 localhost kernel: [ 5045.455369]  [<ffffffffc063ec88>] __cv_wait_io+0x18/0x20 [spl]
Nov  1 00:10:10 localhost kernel: [ 5045.455434]  [<ffffffffc077c5a3>] zio_wait+0x113/0x1d0 [zfs]
Nov  1 00:10:10 localhost kernel: [ 5045.455476]  [<ffffffffc070d1c1>] dsl_pool_sync+0xb1/0x450 [zfs]
Nov  1 00:10:10 localhost kernel: [ 5045.455521]  [<ffffffffc0724fcd>] spa_sync+0x41d/0xb00 [zfs]
Nov  1 00:10:10 localhost kernel: [ 5045.455529]  [<ffffffff820de4f8>] ? ktime_get_ts+0x48/0xf0
Nov  1 00:10:10 localhost kernel: [ 5045.455577]  [<ffffffffc07352a2>] txg_sync_thread+0x382/0x5f0 [zfs]
Nov  1 00:10:10 localhost kernel: [ 5045.455625]  [<ffffffffc0734f20>] ? txg_delay+0xf0/0xf0 [zfs]
Nov  1 00:10:10 localhost kernel: [ 5045.455640]  [<ffffffffc0636f4a>] thread_generic_wrapper+0x7a/0x90 [spl]
Nov  1 00:10:10 localhost kernel: [ 5045.455653]  [<ffffffffc0636ed0>] ? __thread_exit+0xa0/0xa0 [spl]
Nov  1 00:10:10 localhost kernel: [ 5045.455660]  [<ffffffff82094aeb>] kthread+0xdb/0x100
Nov  1 00:10:10 localhost kernel: [ 5045.455665]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0
Nov  1 00:10:10 localhost kernel: [ 5045.455671]  [<ffffffff82787c3c>] ret_from_fork+0x7c/0xb0
Nov  1 00:10:10 localhost kernel: [ 5045.455676]  [<ffffffff82094a10>] ? kthread_create_on_node+0x1c0/0x1c0

I'm using 0.6.3 on Ubuntu 14.10 amd64.

@maci0
Copy link
Contributor

maci0 commented Nov 4, 2014

I am fairly sure this is related to timeouts on the CIFS mount.
Can you show us some more information ? zpool status cat /proc/mounts and so on ...

@dajhorn
Copy link
Contributor

dajhorn commented Nov 4, 2014

This is related to zfsonlinux/pkg-zfs#124. (Backing vdevs with a network filesystem is a configuration unsupported by distro.)

@krichter722
Copy link
Author

Thanks @dajhorn for pointing that out. I'm releaved that there're no severe kernel errors. I was not able to read the stack and make the connection to zfsonlinux/pkg-zfs#124. Could you precise your statement on whether using a network filesystem is unsupported or suffers from "deficiency in the Ubuntu init stack", like you stated in #124, i.e. a dependency on an external issue (which might be fixed at some point), please! Thanks in advance for that :)

@dajhorn
Copy link
Contributor

dajhorn commented Nov 5, 2014

@krichter722,

Could you precise your statement on whether using a network filesystem is unsupported

Try this same loopback configuration with ext4 instead of ZFS and open a support request at the Ubuntu bug tracker. I'm sure the upstream support team will give you a satisfyingly thorough explanation.

or suffers from "deficiency in the Ubuntu init stack", like you stated in #124, i.e. a dependency on an external issue (which might be fixed at some point), please! Thanks in advance for that :)

It might be fixed, but only if users report the issue at the Ubuntu bug tracker.

In this case, the system doesn't have a way to communicate transient network errors to things like ZoL that expect robust block devices. You will get a variety of errors like this every time that the network session fails to immediately service a request or invalidates a file handle, which is often if CIFS or SMB is involved.

In the other case, it just won't work because the network layer depends on the storage layer, and making ZoL depend on Samba breaks several fundamental assumptions in the init stack.

@gmelikov
Copy link
Member

gmelikov commented Jul 9, 2017

Closing as stale.

@gmelikov gmelikov closed this as completed Jul 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants