Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS kernel crash with zconfig.sh and zpios-sanity.sh #418

Closed
fox-pluto opened this issue Sep 30, 2011 · 3 comments
Closed

ZFS kernel crash with zconfig.sh and zpios-sanity.sh #418

fox-pluto opened this issue Sep 30, 2011 · 3 comments
Milestone

Comments

@fox-pluto
Copy link

Hi,

I am testing zfs on virtualbox an I can reproduce a kernel trace with the test suit zconfig.sh
I don't know exactly which test make the kernel crash.

First of all I am not able to test the 4th test:

4 zpool insmod/rmmod device zfs.sh: Unload these modules with 'zfs.sh -u':
zfs zcommon zunicode znvpair zavl spl
Fail (10)

so I test without the 4th test and randomly I have a trace:

[ 220.800547] BUG: unable to handle kernel paging request at ffffffffa0169000
[ 220.810451] IP: [] spl_cache_age+0x0/0x60 [spl]
[ 220.810451] PGD 1a05067 PUD 1a09063 PMD 36c7b067 PTE 3a12c163
[ 220.810451] Oops: 0010 [#1] SMP
[ 220.810451] last sysfs file: /sys/module/zlib_deflate/initstate
[ 220.810451] CPU 0
[ 220.810451] Modules linked in: spl(+) zlib_deflate vesafb joydev ppdev snd_intel8x0 snd_ac97_codec parport_pc ac97_bus snd_pcm psmouse snd_timer serio_raw snd i2c_piix4 soundcore snd_page_alloc lp parport usbhid hid ahci libahci e1000 [last unloaded: spl]
[ 220.810451]
[ 220.810451] Pid: 4, comm: kworker/0:0 Tainted: P 2.6.38-8-server #42-Ubuntu innotek GmbH VirtualBox
[ 220.810451] RIP: 0010:[] [] spl_cache_age+0x0/0x60 [spl]
[ 220.810451] RSP: 0018:ffff88003ce61e18 EFLAGS: 00010246
[ 220.810451] RAX: 0000000000000000 RBX: ffff8800235df868 RCX: ffff88003fc0fd88
[ 220.810451] RDX: 0000000000000000 RSI: ffff88003fc0fd88 RDI: ffff8800235df868
[ 220.810451] RBP: ffff88003ce61e70 R08: 0000000000000001 R09: 0000000000000000
[ 220.810451] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88003e005a80
[ 220.810451] R13: ffff88003fc17200 R14: ffff88003fc0fd80 R15: ffffffffa0169000
[ 220.810451] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 220.810451] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 220.810451] CR2: ffffffffa0169000 CR3: 0000000001a03000 CR4: 00000000000006f0
[ 220.810451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 220.810451] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 220.810451] Process kworker/0:0 (pid: 4, threadinfo ffff88003ce60000, task ffff88003ce444a0)
[ 220.810451] Stack:
[ 220.810451] ffffffff8108224d ffff88003ce61e70 ffffffff8100bd56 ffff88003fc17245
[ 220.810451] 00ff88003ce444a0 ffff88003ce60000 ffff88003e005a80 ffff88003fc0fd80
[ 220.810451] 0000000000014700 ffff88003ce444a0 ffff88003fc0fd88 ffff88003ce61ee0
[ 220.810451] Call Trace:
[ 220.810451] [] ? process_one_work+0x11d/0x420
[ 220.810451] [] ? ftrace_call+0x5/0x2b
[ 220.810451] [] worker_thread+0x169/0x360
[ 220.810451] [] ? worker_thread+0x0/0x360
[ 220.810451] [] kthread+0x96/0xa0
[ 220.810451] [] kernel_thread_helper+0x4/0x10
[ 220.810451] [] ? kthread+0x0/0xa0
[ 220.810451] [] ? kernel_thread_helper+0x0/0x10
[ 220.810451] Code: 83 ec 10 e8 33 2d ea e0 48 8d 7d f8 48 c7 45 f8 ff ff ff 7f c7 45 f0 d0 00 00 00 e8 eb fe ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 <55> 48 89 e5 53 48 83 ec 08 e8 02 2d ea e0 31 d2 48 89 fb 48 8d
[ 220.810451] RIP [] spl_cache_age+0x0/0x60 [spl]
[ 220.810451] RSP
[ 220.810451] CR2: ffffffffa0169000
[ 220.810451] ---[ end trace b94c1ea49ebd9197 ]---
[ 222.942565] BUG: unable to handle kernel paging request at fffffffffffffff8
[ 222.952385] IP: [] kthread_data+0x10/0x20
[ 222.952385] PGD 1a05067 PUD 1a06067 PMD 0
[ 222.952385] Oops: 0000 [#2] SMP
[ 222.952385] last sysfs file: /sys/module/zlib_deflate/initstate
[ 222.952385] CPU 0
[ 222.952385] Modules linked in: spl(+) zlib_deflate vesafb joydev ppdev snd_intel8x0 snd_ac97_codec parport_pc ac97_bus snd_pcm psmouse snd_timer serio_raw snd i2c_piix4 soundcore snd_page_alloc lp parport usbhid hid ahci libahci e1000 [last unloaded: spl]
[ 222.952385]
[ 222.952385] Pid: 4, comm: kworker/0:0 Tainted: P D 2.6.38-8-server #42-Ubuntu innotek GmbH VirtualBox
[ 222.952385] RIP: 0010:[] [] kthread_data+0x10/0x20
[ 222.952385] RSP: 0018:ffff88003ce61a78 EFLAGS: 00010092
[ 222.952385] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88003ce444a0
[ 222.952385] RDX: 0000000000002d91 RSI: 0000000000000000 RDI: ffff88003ce444a0
[ 222.952385] RBP: ffff88003ce61a78 R08: dead000000200200 R09: dead000000200200
[ 222.952385] R10: ffff88003ce48e18 R11: dead000000200200 R12: 0000000000000000
[ 222.952385] R13: ffff88003ce44858 R14: 0000000000000000 R15: 0000000000000046
[ 222.952385] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 222.952385] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 222.952385] CR2: fffffffffffffff8 CR3: 0000000001a03000 CR4: 00000000000006f0
[ 222.952385] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 222.952385] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 222.952385] Process kworker/0:0 (pid: 4, threadinfo ffff88003ce60000, task ffff88003ce444a0)
[ 222.952385] Stack:
[ 222.952385] ffff88003ce61a98 ffffffff810833a5 ffff88003ce61a98 ffff88003fc13d00
[ 222.952385] ffff88003ce61b18 ffffffff815d5542 ffff88003ce61fd8 ffff88003ce60000
[ 222.952385] 0000000000013d00 ffff88003ce44858 ffff88003ce61fd8 0000000000013d00
[ 222.952385] Call Trace:
[ 222.952385] [] wq_worker_sleeping+0x15/0xa0
[ 222.952385] [] schedule+0x5f2/0x760
[ 222.952385] [] do_exit+0x24b/0x410
[ 222.952385] [] oops_end+0xaf/0xf0
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] no_context+0xfd/0x190
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] __bad_area_nosemaphore+0x125/0x1e0
[ 222.952385] [] ? ftrace_call+0x5/0x2b
[ 222.952385] [] bad_area_nosemaphore+0x13/0x20
[ 222.952385] [] do_page_fault+0x44d/0x540
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] ? ftrace_call+0x5/0x2b
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] page_fault+0x25/0x30
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] ? spl_cache_age+0x0/0x60 [spl]
[ 222.952385] [] ? process_one_work+0x11d/0x420
[ 222.952385] [] ? ftrace_call+0x5/0x2b
[ 222.952385] [] worker_thread+0x169/0x360
[ 222.952385] [] ? worker_thread+0x0/0x360
[ 222.952385] [] kthread+0x96/0xa0
[ 222.952385] [] kernel_thread_helper+0x4/0x10
[ 222.952385] [] ? kthread+0x0/0xa0
[ 222.952385] [] ? kernel_thread_helper+0x0/0x10
[ 222.952385] Code: 5e 41 5f c9 c3 be 2b 01 00 00 48 c7 c7 20 5c 7d 81 e8 f5 e6 fd ff e9 84 fe ff ff 55 48 89 e5 e8 97 46 f8 ff 48 8b 87 60 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
[ 222.952385] RIP [] kthread_data+0x10/0x20
[ 222.952385] RSP
[ 222.952385] CR2: fffffffffffffff8
[ 222.952385] ---[ end trace b94c1ea49ebd9198 ]---
[ 222.952385] Fixing recursive fault but reboot is needed!

I have noticed that the kernel crash only if I give more than 1 core to the virtual box.
Some more info:
fox@Brick1:~$ uname -a
Linux Brick1 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

With the zpios-sanity.sh test suit I had:

[ 517.560258] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 517.740114] BUG: unable to handle kernel paging request at ffffffffa014a000
[ 517.740114] IP: [] nv_fixed_reset+0x180/0x180 [znvpair]
[ 517.740114] PGD 1a05067 PUD 1a09063 PMD 37270067 PTE 800000001bf21161
[ 517.740114] Oops: 0011 [#1] SMP
[ 517.740114] last sysfs file: /sys/devices/virtual/bdi/zfs-3/uevent
[ 517.740114] CPU 0
[ 517.740114] Modules linked in: zpios zfs(P) zcommon(P) zunicode(P) znvpair(P) zavl(P) splat spl zlib_deflate vesafb snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ppdev joydev i2c_piix4 psmouse serio_raw parport_pc lp parport usbhid hid ahci libahci e1000 [last unloaded: zpios]
[ 517.740114]
[ 517.740114] Pid: 10, comm: kworker/0:1 Tainted: P 2.6.38-8-server #42-Ubuntu innotek GmbH VirtualBox
[ 517.740114] RIP: 0010:[] [] nv_fixed_reset+0x180/0x180 [znvpair]
[ 517.740114] RSP: 0018:ffff88003ce79e18 EFLAGS: 00010246
[ 517.740114] RAX: 0000000000000000 RBX: ffff88001f6cf868 RCX: ffff88003fc0fd88
[ 517.740114] RDX: 0000000000000000 RSI: ffff88003fc0fd88 RDI: ffff88001f6cf868
[ 517.740114] RBP: ffff88003ce79e70 R08: 0000000000000001 R09: 0000000000000000
[ 517.740114] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88003ce2a100
[ 517.740114] R13: ffff88003fc17200 R14: ffff88003fc0fd80 R15: ffffffffa014a000
[ 517.740114] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 517.740114] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 517.740114] CR2: ffffffffa014a000 CR3: 000000001f8bd000 CR4: 00000000000006f0
[ 517.740114] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 517.740114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 517.740114] Process kworker/0:1 (pid: 10, threadinfo ffff88003ce78000, task ffff88003ce6db80)
[ 517.740114] Stack:
[ 517.740114] ffffffff8108224d ffff88003ce79e70 ffffffff8100bd56 ffff88003fc17265
[ 517.740114] 00ff88003ce6db80 ffff88003ce78000 ffff88003ce2a100 ffff88003fc0fd80
[ 517.740114] 0000000000014700 ffff88003ce6db80 ffff88003fc0fd88 ffff88003ce79ee0
[ 517.740114] Call Trace:
[ 517.740114] [] ? process_one_work+0x11d/0x420
[ 517.740114] [] ? ftrace_call+0x5/0x2b
[ 517.740114] [] worker_thread+0x169/0x360
[ 517.740114] [] ? worker_thread+0x0/0x360
[ 517.740114] [] kthread+0x96/0xa0
[ 517.740114] [] kernel_thread_helper+0x4/0x10
[ 517.740114] [] ? kthread+0x0/0xa0
[ 517.740114] [] ? kernel_thread_helper+0x0/0x10
[ 517.740114] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <04> 00 00 00 14 00 00 00 03 00 00 00 47 4e 55 00 77 a1 82 a6 55
[ 517.740114] RIP [] nv_fixed_reset+0x180/0x180 [znvpair]
[ 517.740114] RSP
[ 517.740114] CR2: ffffffffa014a000
[ 517.740114] ---[ end trace 4d11fad2974988f3 ]---
[ 517.744924] BUG: unable to handle kernel paging request at fffffffffffffff8
[ 517.744927] IP: [] kthread_data+0x10/0x20
[ 517.744931] PGD 1a05067 PUD 1a06067 PMD 0
[ 517.744934] Oops: 0000 [#2] SMP
[ 517.744936] last sysfs file: /sys/devices/virtual/bdi/zfs-3/uevent
[ 517.744938] CPU 0
[ 517.744939] Modules linked in: zpios zfs(P) zcommon(P) zunicode(P) znvpair(P) zavl(P) splat spl zlib_deflate vesafb snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc ppdev joydev i2c_piix4 psmouse serio_raw parport_pc lp parport usbhid hid ahci libahci e1000 [last unloaded: zpios]
[ 517.744958]
[ 517.744960] Pid: 10, comm: kworker/0:1 Tainted: P D 2.6.38-8-server #42-Ubuntu innotek GmbH VirtualBox
[ 517.744964] RIP: 0010:[] [] kthread_data+0x10/0x20
[ 517.744968] RSP: 0018:ffff88003ce79a78 EFLAGS: 00010092
[ 517.744970] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88003ce6db80
[ 517.744973] RDX: 00000000000055bd RSI: 0000000000000000 RDI: ffff88003ce6db80
[ 517.744975] RBP: ffff88003ce79a78 R08: dead000000200200 R09: dead000000200200
[ 517.744978] R10: ffff88003ce4a798 R11: dead000000200200 R12: 0000000000000000
[ 517.744981] R13: ffff88003ce6df38 R14: 0000000000000000 R15: 0000000000000046
[ 517.744984] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 517.744987] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 517.744989] CR2: fffffffffffffff8 CR3: 0000000001a03000 CR4: 00000000000006f0
[ 517.744996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 517.744999] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 517.745002] Process kworker/0:1 (pid: 10, threadinfo ffff88003ce78000, task ffff88003ce6db80)
[ 517.745004] Stack:
[ 517.745006] ffff88003ce79a98 ffffffff810833a5 ffff88003ce79a98 ffff88003fc13d00
[ 517.745009] ffff88003ce79b18 ffffffff815d5542 ffff88003ce79fd8 ffff88003ce78000
[ 517.745013] 0000000000013d00 ffff88003ce6df38 ffff88003ce79fd8 0000000000013d00
[ 517.745017] Call Trace:
[ 517.745020] [] wq_worker_sleeping+0x15/0xa0
[ 517.745026] [] schedule+0x5f2/0x760
[ 517.745032] [] do_exit+0x24b/0x410
[ 517.745036] [] oops_end+0xaf/0xf0
[ 517.745042] [] no_context+0xfd/0x190
[ 517.745045] [] __bad_area_nosemaphore+0x125/0x1e0
[ 517.745049] [] ? ftrace_call+0x5/0x2b
[ 517.745052] [] bad_area_nosemaphore+0x13/0x20
[ 517.745056] [] do_page_fault+0x44d/0x540
[ 517.745060] [] ? ftrace_call+0x5/0x2b
[ 517.745065] [] page_fault+0x25/0x30
[ 517.745070] [] ? process_one_work+0x11d/0x420
[ 517.745073] [] ? ftrace_call+0x5/0x2b
[ 517.745077] [] worker_thread+0x169/0x360
[ 517.745081] [] ? worker_thread+0x0/0x360
[ 517.745084] [] kthread+0x96/0xa0
[ 517.745088] [] kernel_thread_helper+0x4/0x10
[ 517.745093] [] ? kthread+0x0/0xa0
[ 517.745096] [] ? kernel_thread_helper+0x0/0x10
[ 517.745098] Code: 5e 41 5f c9 c3 be 2b 01 00 00 48 c7 c7 20 5c 7d 81 e8 f5 e6 fd ff e9 84 fe ff ff 55 48 89 e5 e8 97 46 f8 ff 48 8b 87 60 03 00 00 <48> 8b 40 f8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
[ 517.745120] RIP [] kthread_data+0x10/0x20
[ 517.745123] RSP
[ 517.745125] CR2: fffffffffffffff8
[ 517.745127] ---[ end trace 4d11fad2974988f4 ]---
[ 517.745129] Fixing recursive fault but reboot is needed!

Thanks for your work, hope to be usefull. If you need I could execute other test, let me know.

Stefano

@protoism
Copy link

protoism commented Oct 3, 2011

Hi fox-pluto,
i had a problem similar to yours
but I could not get a stack trace.
Can you teach me how to do it?
Thanks in advance

@behlendorf
Copy link
Contributor

This issue may have been caused by issue #279. Is so it will have been fixed in master with spl commit openzfs/spl@64c075c

dajhorn referenced this issue in zfsonlinux/pkg-zfs Oct 12, 2011
In a non-debug build the ASSERT() would be optimized away
which could cause pending work items to not be cancelled.

We must also use cancel_delayed_work_sync() rather than just
cancel_delayed_work() to actually wait until work items have
completed.  Otherwise they might accidentally access free'd
memory.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS bugs #279, #62, #363, #418
@behlendorf
Copy link
Contributor

Closing issue, this is believe to have been fixed by the #279 fix.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
…aster

Merge remote-tracking branch '6.0/stage' into 'master'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants