Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs_allow_log_destroy parameter NULL pointer dereference #4872

Closed
heary-cao opened this issue Jul 23, 2016 · 10 comments
Closed

zfs_allow_log_destroy parameter NULL pointer dereference #4872

heary-cao opened this issue Jul 23, 2016 · 10 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Milestone

Comments

@heary-cao
Copy link
Contributor

heary-cao commented Jul 23, 2016

The Problem:
Observed during Linux 2.6.32.41 automated testing while running the ZFS Test Suite.

[ 1120.382032] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1120.390798] IP: [<ffffffffa3d1ff4b>] strfree+0xd/0x1f [zfs]
[ 1120.397088] PGD 56d43067 PUD 4bcc0067 PMD 0 
[ 1120.401882] Oops: 0000 [#1] PREEMPT SMP 
[ 1120.406290] Modules linked in: zfs(O) kinet(O) vul(O) cbs(O) dm_round_robin(O) dm_mod(O) diskstub(PO) uth(O) iocount(O) zvos(O) kshrmem(O) kpm(O) netlink(O) ioatdma(O) kexc(O)  ipmi_si ipmi_devintf [last unloaded: watchdogctrl]
[ 1120.437254] CPU: 0 PID: 11091 Comm: zfs Tainted: P           O 3.10.55-EMBSYS-CGEL-5.01.20.P0 #5
[ 1120.447059] Hardware name: To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M., BIOS 4.6.4 01/15/2013
[ 1120.458999] task: ffff880056c4f160 ti: ffff88004bc52000 task.ti: ffff88004bc52000
[ 1120.467358] RIP: 0010:[<ffffffffa3d1ff4b>]  [<ffffffffa3d1ff4b>] strfree+0xd/0x1f [zfs]
[ 1120.476353] RSP: 0018:ffff88004bc53d18  EFLAGS: 00010286
[ 1120.482281] RAX: 0000000000000000 RBX: ffff88004bc53da8 RCX: ffffffffffffffff
[ 1120.490243] RDX: 0000000000000000 RSI: dead000000100100 RDI: 0000000000000000
[ 1120.498205] RBP: ffff88004bc53d18 R08: ffff88004bc53d28 R09: 2222222222222222
[ 1120.506168] R10: 2222222222222222 R11: 00000000000001ff R12: ffff88016a6d2c00
[ 1120.514130] R13: ffff88016a6d2d38 R14: ffff88016a6d2c00 R15: ffff88016a6d2d18
[ 1120.522093] FS:  00007f72cb5fe740(0000) GS:ffff880277600000(0000) knlGS:0000000000000000
[ 1120.531121] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1120.537533] CR2: 0000000000000000 CR3: 0000000062eb8000 CR4: 00000000000407f0
[ 1120.545496] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1120.553459] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1120.561421] Stack:
[ 1120.563662]  ffff88004bc53d28 ffffffffa3cc396b ffff88004bc53d58 ffffffffa3d28489
[ 1120.571959]  ffff88004bc53d58 ffffffff815f8a77 ffff88016a84e680 ffff88016a7ddb40
[ 1120.580254]  ffff88004bc53de8 ffffffffa3d2934f 000000000000000c ffff88004bc53e68
[ 1120.588548] Call Trace:
[ 1120.591335]  [<ffffffffa3cc396b>] zfs_allow_log_destroy+0x9/0xb [zfs]
[ 1120.598572]  [<ffffffffa3d28489>] tsd_hash_dtor+0xa9/0x103 [zfs]
[ 1120.605280]  [<ffffffff815f8a77>] ? _raw_spin_lock+0x1e/0x23
[ 1120.611645]  [<ffffffffa3d2934f>] tsd_set+0x261/0x82d [zfs]
[ 1120.617912]  [<ffffffffa3d28726>] ? tsd_get+0xbd/0x180 [zfs]
[ 1120.624276]  [<ffffffffa3cc6b01>] zfs_ioc_log_history+0x29/0xc6 [zfs]
[ 1120.631486]  [<ffffffffa3c4a9f4>] ? nvlist_alloc+0x24/0x2b [zfs]
[ 1120.638238]  [<ffffffffa3cc9e9b>] zfsdev_ioctl+0x339/0x4b8 [zfs]
[ 1120.644947]  [<ffffffff8110ec5b>] do_vfs_ioctl+0x424/0x463
[ 1120.651069]  [<ffffffff815f8bff>] ? _raw_spin_unlock+0x27/0x32
[ 1120.657579]  [<ffffffff81062d21>] ? vtime_account_user+0x5f/0x68
[ 1120.664285]  [<ffffffff81116bf5>] ? fget_light+0x3d/0x9b
[ 1120.670206]  [<ffffffff8110ecf3>] SyS_ioctl+0x59/0x7e
[ 1120.675844]  [<ffffffff815fed0d>] tracesys+0xdd/0xe2
[ 1120.681384] Code: 89 d7 48 f7 d1 48 89 ce e8 1e ec ff ff b9 12 00 00 00 31 c0 48 89 df f3 ab 41 59 5b c9 c3 55 48 89 fa 48 89 e5 31 c0 48 83 c9 ff <f2> ae 48 89 d7 48 f7 d1 48 89 ce e8 f0 eb ff ff c9 c3 55 48 89 
[ 1120.703176] RIP  [<ffffffffa3d1ff4b>] strfree+0xd/0x1f [zfs]
[ 1120.709551]  RSP <ffff88004bc53d18>
[ 1120.713440] CR2: 0000000000000000
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.32.41-EMBSYS-CGEL-4.01.20.P0.F0 (root@localhost.localdomain) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #6 PREEMPT Mon Jul 15 11:22:09 CST 2013
[    0.000000] Command line: isolcpus=1,2,3,4,5,6,7 ro root=/dev/ram console=ttyS0,115200n8 bu=0 rk=0 sf=0 st=0 ci=1 bdar=3 bt=0 dpdkcfg=0000 loader=GRUB_USP_3.1.10_svn5861  memmap=exactmap memmap=640K@0K memmap=9108K@32768K memmap=121304K@42516K elfcorehdr=163820K memmap=1076K#2070996K memmap=488K#2084380K memmap=4K#2085576K memmap=32K#2085604K memmap=536K#2085800K
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD

Primary analysis of causes:
zfs_ioc_log_history execute tsd_set when tsd set he_value of zfs_allow_log_key is null, so that execute zfs_allow_log_destroy parameter is null, strfree is coredump.

@behlendorf behlendorf added this to the 0.7.0 milestone Jul 26, 2016
@behlendorf behlendorf added the Component: Test Suite Indicates an issue with the test framework or a test case label Jul 26, 2016
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 26, 2016
Under Linux tsd_set() will call the destructor on the thread
specific data when the passed value is NULL.  Therefore, there
is no need to call strfree() on the poolname after tsd_set().
The call to tsd_set() must also be moved after spa_open() to
prevent a use-after-free style defect.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#4872
@behlendorf
Copy link
Contributor

Proposed fix in #4884, reviews welcome.

@behlendorf
Copy link
Contributor

behlendorf commented Jul 26, 2016

I should add that the proposed fix doesn't exactly agree with the stack trace you posted since it suggests the error occurred in tsd_set(). However, the way Linux kernel unwinds stack traces makes them not 100% reliable so this may the second strfree().

@tuxoko
Copy link
Contributor

tuxoko commented Jul 26, 2016

@heary-cao
Please repost your log enclosed with
Also what version of zfs are you using?

@behlendorf
Copy link
Contributor

I've added the ``` to the original comment for readability.

@tuxoko
Copy link
Contributor

tuxoko commented Jul 26, 2016

All code
========
   0:   89 d7                   mov    %edx,%edi
   2:   48 f7 d1                not    %rcx
   5:   48 89 ce                mov    %rcx,%rsi
   8:   e8 1e ec ff ff          callq  0xffffffffffffec2b
   d:   b9 12 00 00 00          mov    $0x12,%ecx
  12:   31 c0                   xor    %eax,%eax
  14:   48 89 df                mov    %rbx,%rdi
  17:   f3 ab                   rep stos %eax,%es:(%rdi)
  19:   41 59                   pop    %r9
  1b:   5b                      pop    %rbx
  1c:   c9                      leaveq 
  1d:   c3                      retq   
  1e:   55                      push   %rbp
  1f:   48 89 fa                mov    %rdi,%rdx
  22:   48 89 e5                mov    %rsp,%rbp
  25:   31 c0                   xor    %eax,%eax
  27:   48 83 c9 ff             or     $0xffffffffffffffff,%rcx
  2b:   f2 ae                   repnz scas %es:*(%rdi),%al      <-- trapping instruction
  2d:   48 89 d7                mov    %rdx,%rdi
  30:   48 f7 d1                not    %rcx
  33:   48 89 ce                mov    %rcx,%rsi
  36:   e8 f0 eb ff ff          callq  0xffffffffffffec2b
  3b:   c9                      leaveq 
  3c:   c3                      retq   
  3d:   55                      push   %rbp
  3e:   48                      rex.W
  3f:   89                      .byte 0x89

It seems to be doing strlen.
Do we have strlen in strfree in old version?

@tuxoko
Copy link
Contributor

tuxoko commented Jul 26, 2016

Oh yeah we does, so this is pretty old version isn't it.

heary-cao added a commit to heary-cao/zfs that referenced this issue Jul 27, 2016
zfs_allow_log_destroy parameter NULL pointer dereference

issues: please see  openzfs#4872
Observed during Linux 2.6.32.41 automated testing while running the ZFS Test Suite. Cause ZFS software to produce coredump.

Cause analysis:
In zfs_ioc_log_history function, the implementation of tsd_set function, will he_value of the TSD module is set to null, 
resulting in TSD module remove a entry, so he_value of the entry is null, 
casue to implement zfs_allow_log_key private function zfs_allow_log_destroy.
zfs_allow_log_destroy parameter is null, the strfree a null. Produce coredump.

Solution:
1, in order to safety, 
in the zfs_ioc_log_history function,from the TSD module to get to the poolName, 
it is possible for the NULL, so whether the processing of NULL.
if poolname is NULL,return error.

2, zfs_allow_log_key of the private function zfs_allow_log_destroy in the Senate,
   it is possible for the emergence of NULL, 
   so for arg release when the judge for the NULL and then strfree it.
heary-cao referenced this issue in heary-cao/zfs Jul 27, 2016
zfs_allow_log_destroy parameter NULL pointer dereference

issues: please see  zfsonlinux#4872
Observed during Linux 2.6.32.41 automated testing while running the ZFS Test Suite. Cause ZFS software to produce coredump.

Cause analysis:
In zfs_ioc_log_history function, the implementation of tsd_set function, will he_value of the TSD module is set to null, 
resulting in TSD module remove a entry, so he_value of the entry is null, 
casue to implement zfs_allow_log_key private function zfs_allow_log_destroy.
zfs_allow_log_destroy parameter is null, the strfree a null. Produce coredump.

Solution:
1, in order to safety, 
in the zfs_ioc_log_history function,from the TSD module to get to the poolName, 
it is possible for the NULL, so whether the processing of NULL.
if poolname is NULL,return error.

2, zfs_allow_log_key of the private function zfs_allow_log_destroy in the Senate,
   it is possible for the emergence of NULL, 
   so for arg release when the judge for the NULL and then strfree it.
@behlendorf
Copy link
Contributor

Oh yeah we does, so this is pretty old version isn't it.

A really really old version. The odd thing I see about this backtrace is strfree+0xd/0x1f [zfs] claims to be part of the zfs.ko kernel module and not spl.ko. For a long time strfree() has been exported by spl.ko. @heary-cao can you post what version of ZoL you were able to reproduce this with.

@tuxoko
Copy link
Contributor

tuxoko commented Jul 27, 2016

@behlendorf Yes I notice the [zfs] thing too.
Also the fact the having a old strfree with new tsd_set is impossible in term of commit history. Not to mention the strfree with strlen is wrong to begin with. This is definitely from privately maintained(mismaintained?) branch.

@heary-cao
Copy link
Contributor Author

dear all:
zfs version:zfs-0.6.3
spl version:spl-0.6.3
For SPL, we're in sync once. So the TSD module is relatively new.
Indeed, we compile ZFS and SPL together into zfs.ko.

see strfree coredump, we code review zfs-0.6.5.7, the coredump process is basically no change.
and the cause eason is tsd set key(zfs_allow_log_key) value is NULL, but in tsd_set function, when remove old entry ,reset entry new value is NULL.
if (entry) {
entry->he_value = value;
/* remove the entry */
if (remove)
tsd_remove_entry(entry);
then execute tsd_remove_entry function, entry he_value is NULL, so zfs_allow_log_destroy arg is NULL,free a NULL memory. casue coredump.

I have a commit pull requests, please reviews.
signed : zfs allow log destroy parameter NULL #4891

thinks you!

@tuxoko
Copy link
Contributor

tuxoko commented Jul 28, 2016

@heary-cao
Please use upstream version as-is, otherwise we can't support you.

heary-cao added a commit to heary-cao/zfs that referenced this issue Jul 29, 2016
zfs_allow_log_destroy parameter NULL pointer dereference

issues: please see  openzfs#4872
Observed during Linux 2.6.32.41 automated testing while running the ZFS Test Suite. Cause ZFS software to produce coredump.

Cause analysis:
In zfs_ioc_log_history function, the implementation of tsd_set function, will he_value of the TSD module is set to null,
resulting in TSD module remove a entry, so he_value of the entry is null,
casue to implement zfs_allow_log_key private function zfs_allow_log_destroy.
zfs_allow_log_destroy parameter is null, the strfree a null. Produce coredump.

Solution:
1, in order to safety,
in the zfs_ioc_log_history function,from the TSD module to get to the poolName,
it is possible for the NULL, so whether the processing of NULL.
if poolname is NULL,return error.

2, zfs_allow_log_key of the private function zfs_allow_log_destroy in the Senate,
   it is possible for the emergence of NULL,
   so for arg release when the judge for the NULL and then strfree it.
heary-cao added a commit to heary-cao/zfs that referenced this issue Jul 29, 2016
zfs_allow_log_destroy parameter NULL pointer dereference

issues: please see  openzfs#4872
Observed during Linux 2.6.32.41 automated testing while running the ZFS Test Suite. Cause ZFS software to produce coredump.

Cause analysis:
In zfs_ioc_log_history function, the implementation of tsd_set function, will he_value of the TSD module is set to null,
resulting in TSD module remove a entry, so he_value of the entry is null,
casue to implement zfs_allow_log_key private function zfs_allow_log_destroy.
zfs_allow_log_destroy parameter is null, the strfree a null. Produce coredump.

Solution:
1, in order to safety,
in the zfs_ioc_log_history function,from the TSD module to get to the poolName,
it is possible for the NULL, so whether the processing of NULL.
if poolname is NULL,return error.

2, zfs_allow_log_key of the private function zfs_allow_log_destroy in the Senate,
   it is possible for the emergence of NULL,
   so for arg release when the judge for the NULL and then strfree it.

Signed-off-by: caoxuewen <cao.xuewen@zte.com.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

3 participants