Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing #40

Open
kloenk opened this issue Nov 29, 2020 · 13 comments
Open

Testing #40

kloenk opened this issue Nov 29, 2020 · 13 comments
Labels
• kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options...

Comments

@kloenk
Copy link
Member

kloenk commented Nov 29, 2020

Create some way to run rust test.

Bindgen creates tests, for allignment. maybe we can test them somehow.

@ojeda ojeda added • kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options... - optional labels Nov 29, 2020
@stappersg
Copy link

Few days after this issue was created, was merge request #52, "use rustc directly, created".

With this comment are now those two connected. How helpfull this comment (this connection) is something that can be told in a next comment.

Thing I'm aim for is a comment that tells which how helpful cargo test is, when builds are done without cargo. Or even a rename of this issue from cargo test to rust test.

@ojeda
Copy link
Member

ojeda commented Jan 2, 2021

Yeah, we can use this issue for testing in general. It's something we should probably take a look at relatively soon.

@ojeda ojeda changed the title cargo test Testing Jan 2, 2021
@adamrk
Copy link

adamrk commented Feb 17, 2021

Seems like there was a sporadic test failure here: https://github.com/Rust-for-Linux/linux/runs/1921879052 (the only change from the previous commit which passed was a comment). Not sure if it's something to worry about.

@ojeda
Copy link
Member

ojeda commented Feb 17, 2021

Hmm... It doesn't look like it is us, since it happens before we run any Rust code, but we could have broken some invariant somewhere nevertheless. Do you want to post it to the LKML? (It would be best to reproduce it first without the Rust stuff compiled in, though).

Related: perhaps we should add a CI run to compile things without enabling Rust but with our changes, and start using fail-fast: false to keep all the CI results.

@ojeda
Copy link
Member

ojeda commented Mar 18, 2021

As I mentioned elsewhere, I will be splitting rust_example.rs into different kernel modules as poor man's tests.

Later on, after the RFC most likely, I will be working on setting up a proper test framework so that we can write tests as easy as in normal Rust.

@ojeda ojeda mentioned this issue May 2, 2021
@alastairreid
Copy link

[@wedsonaf pointed me at this issue because I am looking at the closely related question of 'how can we use formal verification tools with Rust?' - and Rust-for-Linux might be a really good use for the tools. In particular, I want to create a verification harness that can also be used as a test-harness or fuzzing harness.]

When looking at the current samples, it looks as if there are two important decisions about testing (or verification):

  1. Should a test for the Rust code call into any of the C parts of the Linux kernel or should parts of rust/kernel/*.rs be replaced by a mocking library that mocks memory allocation, printk, device registration, mutexes, etc.?
  2. Should a test for a device driver access the driver by invoking something that looks like Linux syscalls or should it access the driver by invoking Rust methods?

I am currently exploring the No/No choice. That is, I am modifying rust/kernel/.rs so that it does not rely on bindings:: types/functions and I am trying to write tests that use Rust types like UserSlicePtr, File, etc. and invoke Rust methods to manipulate those objects instead of trying to invoke drivers by creating inodes with the right major/minor device numbers and invoking the standard C-shaped system calls.

I'm making the No/No choice because I specifically want to focus on the Rust code and because I want to be able to write generic tests that benefit from Rust's traits, etc.
But note that other choices in this design space would also make sense. e.g., Tests that invoke both the Rust and the the C parts of the Linux kernel, have the potential to detect mismatches/misunderstandings between the Rust code and the C code whereas the tests I am writing could never find those because they ignore all the C code.

So far I have exactly one test working. It checks that samples/rust_chrdev.rs rejects attempts to read from the file. Not a particularly strenuous test for now: my focus is still on mocking enough of rust/kernel/*.rs that I can tackle more tricky code.

Once I have more of the mocking layer implemented, my plan is to use the KLEE verification tool to look for bugs such as integer overflows, array bounds violations, etc. in the Rust code. I will also use the same test harness with a fuzzer as an alternative way of looking for bugs. I am hoping that being able to verify/fuzz for these low level bugs would make it less painful/worrying to disable runtime checks in production builds.

Questions:

  • Is anybody else working on tests?
  • Does the No/No choice I am making fit with your thoughts on testing or do you think I should shift my focus?

@kloenk
Copy link
Member Author

kloenk commented May 14, 2021

Just a quick idea. You could use no mangling and extern fn to create a mock library which shadows every C function used by the kernel rust crate. So you use the normal rust/kernel but that links against some custom kernel with shadows functions

@alastairreid
Copy link

Thanks. I think there's several places where I could cut the dependencies and insert my mock. The C interface has the huge advantage of being (I think) fairly stable. But it's also a bit low level.

Some places where I think it would be easier to modify rust/kernel/*.rs are as follows. (Where modify probably means 'refactor to expose the right interfaces and then use conditional compilation.)

First: to call FileOperations::read, you need to pass a &File. The File struct contains a pointer to bindings::file so I need to create a pointer to a file somehow. I think that the only way to create a file object is to invoke C code in the kernel. At the moment, I am YOLOing it and passing a null pointer but this only works because (AFAIK) the code I am testing does not use the two public methods on File both of which dereference it - but that is not going to work in the long term.
The cleanest solution that I can see seems to be to have an alternative implementation of File. (The underlying data structure has about 15-20 fields, lots of invariants, etc. and I think that replicating that code in Rust would be a mistake and that trying to invoke the C code would pull in a lot of dependencies.

Second: There are Rust functions that I think can be simplified down to no-ops (for testing purposes). For example, the Registration::register function in rust/kernel/miscdev.rs jumps through a bunch of hoops to initialize a C device object, construct a VTABLE, etc. so that it can call bindings::misc_register. If I understand this correctly, the only purpose for this is so that the C part of the Linux kernel can use the major/minor device numbers on an inode to decide which device driver to invoke. Since my plan for testing sidesteps that mechanism, this is not needed. (This is a bit less clear cut - and, in this case, your suggestion also works and is simpler in some ways.)

Third: In part because formal verification of Rust is in its infancy, there don't seem to be any verifiers that support concurrency at the moment. So the entire content of rust/kernel/sync is a problem because the verifier does not support threads, mutexes, semaphores, etc. Getting the right behaviour (or, at least, a sufficiently good approximation) here will almost certainly require some major surgery. This won't necessarily affect conventional testing and fuzzing - but it is a major issue for me.

At the moment, my feeling is that I am going to have to use several different approaches here: sometimes making the cut at the Rust-C interface as you suggest, sometimes making the cut in the Rust code. But it's early days and I expect that I will end up having to discard and redo my first attempt or two...

@ojeda
Copy link
Member

ojeda commented May 17, 2021

@alastairreid

Is anybody else working on tests?

My plan is to work on supporting unit tests. I am not aware of anybody working on anything else.

At the moment, we only have a few sanity functional tests for the CI.

Does the No/No choice I am making fit with your thoughts on testing or do you think I should shift my focus?

Hard to answer :) Perhaps having a meeting for this would be best.

My gut feeling is that userspace functional tests are currently the most valuable given it is still early days. They are likely the easiest to write and the most stable ones.

Some places where I think it would be easier to modify rust/kernel/*.rs are as follows. (Where modify probably means 'refactor to expose the right interfaces and then use conditional compilation.)

In general, I would be cautious about changing interfaces to accommodate testing/mocking.

This is not to say we should not do it: if we can show it has a big impact, we should definitely push for it. However, we first need to get Rust proven within the kernel.

I am hoping that being able to verify/fuzz for these low level bugs would make it less painful/worrying to disable runtime checks in production builds.

There may be some particular places where we want to do unchecked operations due to performance reasons, and the policy is to justify every unsafe block in the form of a comment (and, of course, ideally they should be proven/tested/fuzzed/etc.). But, in general, I do not think we should encourage users to run kernels with checks disabled.

The C interface has the huge advantage of being (I think) fairly stable.

Please note that the kernel internal APIs are not stable. However, it is true that, at the moment, the Rust ones are likely to change more than the C ones (by virtue of being new). But longer-term there would be nothing that would make the Rust ones more or less stable than the C ones.

my plan is to use the KLEE verification tool to look for bugs such as integer overflows, array bounds violations, etc. in the Rust code.

That sounds great. Has there been any project on using KLEE for the C side? Any showstoppers?

Another thing I would like to have is Miri/KASAN/UBSAN etc. for the Rust side.

@alastairreid
Copy link

@ojeda

Thanks for your comments. All makes sense.
I'll see how far I can get with @kloenk's suggestion of implementing the bindings:: API - once I have a bit more experience and have some sample code to share (i.e., not the very small hack-fest I currently have), a meeting would probably be good.

There may be some particular places where we want to do unchecked operations due to performance reasons, and the policy is to justify every unsafe block in the form of a comment (and, of course, ideally they should be proven/tested/fuzzed/etc.). But, in general, I do not think we should encourage users to run kernels with checks disabled.

My interpretation of Linus Torvalds' comment on your RFC was that any panics from the Rust code were unacceptable to him. He focused on the out-of-memory panic that can be fixed by using a panic-free memory API but I'd think that he would have the same reaction to the 'invisible' panics inserted by the compiler to implement runtime overflow checks and runtime array bounds checks because they also cause the kernel to halt with loss of service and potential loss of data.
Of course, I may be over-generalizing his statement and/or he may shift his position...

Has there been any project on using KLEE for the C side? Any showstoppers?

I think it has been used once - but other C verification tools have been used far more. Which suggests that there might be problems that made them stop?
(I'm using KLEE because it uses LLVM-IR and so I can use it with rustc or even with rustc+clang. The tools normally used for the Linux kernel are C-only.)

Another thing I would like to have is Miri/KASAN/UBSAN etc. for the Rust side.

Yes, definitely!

In fact, maybe the right development sequence is:

  1. Usermode tests that can be run directly.
  2. Usermode tests that can be run in miri and with sanitizers
  3. Usermode parameterized tests that can used with fuzzers and with KLEE.

("Parameterized tests" are also called "structure-aware fuzzing", "property-based testing" or "verification harnesses" - depending on which tool you are thinking about using - but they are more or less the same thing and writing tests that can be used for all is my preferred way of using KLEE.)

Having said that, I might jump straight from (1) to (3) because the KLEE part is the bit that worries me most and is my primary focus.

@ojeda
Copy link
Member

ojeda commented May 18, 2021

My interpretation of Linus Torvalds' comment on your RFC was that any panics from the Rust code were unacceptable to him. He focused on the out-of-memory panic that can be fixed by using a panic-free memory API but I'd think that he would have the same reaction to the 'invisible' panics inserted by the compiler to implement runtime overflow checks and runtime array bounds checks because they also cause the kernel to halt with loss of service and potential loss of data.
Of course, I may be over-generalizing his statement and/or he may shift his position...

They are very different cases:

  • The alloc-related panics are there simply because there is no fallible approach. It is an interface design problem. Having a allocation failure does not mean the kernel has a bug, just that it does not have memory to serve a particular request.
  • On the other hand, OOB accesses, unexpected overflows, etc. are severe bugs with potentially unbounded consequences. In general, one cannot trust the environment after they occur. They are not a design problem in the sense of the alloc ones.

That is why latter ones should panic the same way dereferencing the start of the address space does (whether such a panic should kill only the current thread or the entire kernel is another question).

Furthermore, in general, potential loss of data is way better than silent corruption of data. Similarly, denial of service is way better than getting a server compromised.

Having said that, I might jump straight from (1) to (3) because the KLEE part is the bit that worries me most and is my primary focus.

Of course, feel free to take the approach that you prefer, the one that you think will net more benefits or the one that is best for your research.

My points in the previous message were only about what I think we can handle/merge at the present time (e.g. refactoring many things before we have managed to get Rust in mainline may be problematic).

@ojeda
Copy link
Member

ojeda commented May 18, 2021

("Parameterized tests" are also called "structure-aware fuzzing", "property-based testing" or "verification harnesses" - depending on which tool you are thinking about using - but they are more or less the same thing and writing tests that can be used for all is my preferred way of using KLEE.)

I am sure you have seen it already, but just in case: Rust has a nice library for property-based testing called proptest inspired in hypothesis for Python.

@alastairreid
Copy link

Yes, I like proptest a lot!
More or less the first thing I did after figuring out how to use KLEE with Rust was reimplement the proptest API for KLEE (and any similar tool). I wrote this blog article about it.

btw In case you're interested, here is the status of getting KLEE to work with Rust (another blog article).

ojeda pushed a commit that referenced this issue Apr 12, 2022
…frame()

The following KASAN warning is detected by QEMU.

==================================================================
BUG: KASAN: stack-out-of-bounds in unwind_frame+0x508/0x870
Read of size 4 at addr c36bba90 by task cat/163

CPU: 1 PID: 163 Comm: cat Not tainted 5.10.0-rc1 #40
Hardware name: ARM-Versatile Express
[<c0113fac>] (unwind_backtrace) from [<c010e71c>] (show_stack+0x10/0x14)
[<c010e71c>] (show_stack) from [<c0b805b4>] (dump_stack+0x98/0xb0)
[<c0b805b4>] (dump_stack) from [<c0b7d658>] (print_address_description.constprop.0+0x58/0x4bc)
[<c0b7d658>] (print_address_description.constprop.0) from [<c031435c>] (kasan_report+0x154/0x170)
[<c031435c>] (kasan_report) from [<c0113c44>] (unwind_frame+0x508/0x870)
[<c0113c44>] (unwind_frame) from [<c010e298>] (__save_stack_trace+0x110/0x134)
[<c010e298>] (__save_stack_trace) from [<c01ce0d8>] (stack_trace_save+0x8c/0xb4)
[<c01ce0d8>] (stack_trace_save) from [<c0313520>] (kasan_set_track+0x38/0x60)
[<c0313520>] (kasan_set_track) from [<c0314cb8>] (kasan_set_free_info+0x20/0x2c)
[<c0314cb8>] (kasan_set_free_info) from [<c0313474>] (__kasan_slab_free+0xec/0x120)
[<c0313474>] (__kasan_slab_free) from [<c0311e20>] (kmem_cache_free+0x7c/0x334)
[<c0311e20>] (kmem_cache_free) from [<c01c35dc>] (rcu_core+0x390/0xccc)
[<c01c35dc>] (rcu_core) from [<c01013a8>] (__do_softirq+0x180/0x518)
[<c01013a8>] (__do_softirq) from [<c0135214>] (irq_exit+0x9c/0xe0)
[<c0135214>] (irq_exit) from [<c01a40e4>] (__handle_domain_irq+0xb0/0x110)
[<c01a40e4>] (__handle_domain_irq) from [<c0691248>] (gic_handle_irq+0xa0/0xb8)
[<c0691248>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x94)
Exception stack(0xc36bb928 to 0xc36bb970)
b920:                   c36bb9c0 00000000 c0126919 c0101228 c36bb9c0 b76d7730
b940: c36b8000 c36bb9a0 c3335b00 c01ce0d8 00000003 c36bba3c c36bb940 c36bb978
b960: c010e298 c011373c 60000013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c011373c>] (unwind_frame+0x0/0x870)
[<c011373c>] (unwind_frame) from [<00000000>] (0x0)

The buggy address belongs to the page:
page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x636bb
flags: 0x0()
raw: 00000000 00000000 ef867764 00000000 00000000 00000000 ffffffff 00000000
page dumped because: kasan: bad access detected

addr c36bba90 is located in stack of task cat/163 at offset 48 in frame:
 stack_trace_save+0x0/0xb4

this frame has 1 object:
 [32, 48) 'trace'

Memory state around the buggy address:
 c36bb980: f1 f1 f1 f1 00 04 f2 f2 00 00 f3 f3 00 00 00 00
 c36bba00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>c36bba80: 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
                 ^
 c36bbb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 c36bbb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

There is a same issue on x86 and has been resolved by the commit f7d27c3
("x86/mm, kasan: Silence KASAN warnings in get_wchan()").
The solution could be applied to arm architecture too.

Signed-off-by: Lin Yujun <linyujun809@huawei.com>
Reported-by: He Ying <heying24@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
psomas pushed a commit to psomas/linux-rs that referenced this issue Dec 30, 2022
We use uprobe in aarch64_be, which we found the tracee task would exit
due to SIGILL when we enable the uprobe trace.
We can see the replace inst from uprobe is not correct in aarch big-endian.
As in Armv8-A, instruction fetches are always treated as little-endian,
we should treat the UPROBE_SWBP_INSN as little-endian。

The test case is as following。
bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null &
bash-4.4# cd /sys/kernel/debug/tracing/
bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events
bash-4.4# echo 1 > events/uprobes/enable
bash-4.4#
bash-4.4# ps
  PID TTY          TIME CMD
  140 ?        00:00:01 bash
  237 ?        00:00:00 ps
[1]+  Illegal instruction     ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null

which we debug use gdb as following:

bash-4.4# gdb attach 155
(gdb) disassemble send
Dump of assembler code for function send:
   0x0000000000400c30 <+0>:     .inst   0xa00020d4 ; undefined
   0x0000000000400c34 <+4>:     mov     x29, sp
   0x0000000000400c38 <+8>:     str     w0, [sp, Rust-for-Linux#28]
   0x0000000000400c3c <+12>:    strb    w1, [sp, Rust-for-Linux#27]
   0x0000000000400c40 <+16>:    str     xzr, [sp, Rust-for-Linux#40]
   0x0000000000400c44 <+20>:    str     xzr, [sp, Rust-for-Linux#48]
   0x0000000000400c48 <+24>:    add     x0, sp, #0x1b
   0x0000000000400c4c <+28>:    mov     w3, #0x0                 // #0
   0x0000000000400c50 <+32>:    mov     x2, #0x1                 // Rust-for-Linux#1
   0x0000000000400c54 <+36>:    mov     x1, x0
   0x0000000000400c58 <+40>:    ldr     w0, [sp, Rust-for-Linux#28]
   0x0000000000400c5c <+44>:    bl      0x405e10 <mq_send>
   0x0000000000400c60 <+48>:    str     w0, [sp, Rust-for-Linux#60]
   0x0000000000400c64 <+52>:    ldr     w0, [sp, Rust-for-Linux#60]
   0x0000000000400c68 <+56>:    ldp     x29, x30, [sp], Rust-for-Linux#64
   0x0000000000400c6c <+60>:    ret
End of assembler dump.
(gdb) info b
No breakpoints or watchpoints.
(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
0x0000000000400c30 in send ()
(gdb) x/10x 0x400c30
0x400c30 <send>:    0xd42000a0   0xfd030091      0xe01f00b9      0xe16f0039
0x400c40 <send+16>: 0xff1700f9   0xff1b00f9      0xe06f0091      0x03008052
0x400c50 <send+32>: 0x220080d2   0xe10300aa
(gdb) disassemble 0x400c30
Dump of assembler code for function send:
=> 0x0000000000400c30 <+0>:     .inst   0xa00020d4 ; undefined
   0x0000000000400c34 <+4>:     mov     x29, sp
   0x0000000000400c38 <+8>:     str     w0, [sp, Rust-for-Linux#28]
   0x0000000000400c3c <+12>:    strb    w1, [sp, Rust-for-Linux#27]
   0x0000000000400c40 <+16>:    str     xzr, [sp, Rust-for-Linux#40]

Signed-off-by: junhua huang <huang.junhua@zte.com.cn>
Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
@ojeda ojeda removed the prio: high label Feb 17, 2023
fbq pushed a commit that referenced this issue Sep 25, 2023
Inject fault while probing btrfs.ko, if kstrdup() fails in
eventfs_prepare_ef() in eventfs_add_dir(), it will return ERR_PTR
to assign file->ef. But the eventfs_remove() check NULL in
trace_module_remove_events(), which causes the below NULL
pointer dereference.

As both Masami and Steven suggest, allocater side should handle the
error carefully and remove it, so fix the places where it failed.

 Could not create tracefs 'raid56_write' directory
 Btrfs loaded, zoned=no, fsverity=no
 Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
 Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000102544000
 [000000000000001c] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in: btrfs(-) libcrc32c xor xor_neon raid6_pq cfg80211 rfkill 8021q garp mrp stp llc ipv6 [last unloaded: btrfs]
 CPU: 15 PID: 1343 Comm: rmmod Tainted: G                 N 6.5.0+ #40
 Hardware name: linux,dummy-virt (DT)
 pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : eventfs_remove_rec+0x24/0xc0
 lr : eventfs_remove+0x68/0x1d8
 sp : ffff800082d63b60
 x29: ffff800082d63b60 x28: ffffb84b80ddd00c x27: ffffb84b3054ba40
 x26: 0000000000000002 x25: ffff800082d63bf8 x24: ffffb84b8398e440
 x23: ffffb84b82af3000 x22: dead000000000100 x21: dead000000000122
 x20: ffff800082d63bf8 x19: fffffffffffffff4 x18: ffffb84b82508820
 x17: 0000000000000000 x16: 0000000000000000 x15: 000083bc876a3166
 x14: 000000000000006d x13: 000000000000006d x12: 0000000000000000
 x11: 0000000000000001 x10: 00000000000017e0 x9 : 0000000000000001
 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffb84b84289804
 x5 : 0000000000000000 x4 : 9696969696969697 x3 : ffff33a5b7601f38
 x2 : 0000000000000000 x1 : ffff800082d63bf8 x0 : fffffffffffffff4
 Call trace:
  eventfs_remove_rec+0x24/0xc0
  eventfs_remove+0x68/0x1d8
  remove_event_file_dir+0x88/0x100
  event_remove+0x140/0x15c
  trace_module_notify+0x1fc/0x230
  notifier_call_chain+0x98/0x17c
  blocking_notifier_call_chain+0x4c/0x74
  __arm64_sys_delete_module+0x1a4/0x298
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x68/0xe0
  do_el0_svc+0x1c/0x28
  el0_svc+0x3c/0xc4
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Code: 5400052c a90153b3 aa0003f3 aa0103f4 (f9401400)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Kernel Offset: 0x384b00c00000 from 0xffff800080000000
 PHYS_OFFSET: 0xffffcc5b80000000
 CPU features: 0x88000203,3c020000,1000421b
 Memory Limit: none
 Rebooting in 1 seconds..

Link: https://lore.kernel.org/linux-trace-kernel/20230912134752.1838524-1-ruanjinjie@huawei.com
Link: https://lore.kernel.org/all/20230912025808.668187-1-ruanjinjie@huawei.com/
Link: https://lore.kernel.org/all/20230911052818.1020547-1-ruanjinjie@huawei.com/
Link: https://lore.kernel.org/all/20230909072817.182846-1-ruanjinjie@huawei.com/
Link: https://lore.kernel.org/all/20230908074816.3724716-1-ruanjinjie@huawei.com/

Cc: Ajay Kaher <akaher@vmware.com>
Fixes: 5bdcd5f ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
metaspace pushed a commit that referenced this issue Feb 7, 2024
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:

  ============================================
  WARNING: possible recursive locking detected
  6.7.0 #40 Not tainted
  --------------------------------------------
  kworker/u16:5/90 is trying to acquire lock:
  ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc

              but task is already holding lock:
  ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc

              other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(pci_bus_sem);
    lock(pci_bus_sem);

               *** DEADLOCK ***

  Call trace:
   print_deadlock_bug+0x25c/0x348
   __lock_acquire+0x10a4/0x2064
   lock_acquire+0x1e8/0x318
   down_read+0x60/0x184
   pcie_aspm_pm_state_change+0x58/0xdc
   pci_set_full_power_state+0xa8/0x114
   pci_set_power_state+0xc4/0x120
   qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
   pci_walk_bus+0x64/0xbc
   qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]

The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.

Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.

Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71a ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>	# 6.7
ojeda pushed a commit that referenced this issue Aug 18, 2024
The queue stats API queries the queues according to the
real_num_[tr]x_queues, in case the device is down and channels were not
yet created, don't try to query their statistics.

To trigger the panic, run this command before the interface is brought
up:
./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml --dump qstats-get --json '{"ifindex": 4}'

BUG: kernel NULL pointer dereference, address: 0000000000000c00
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 977 Comm: python3 Not tainted 6.10.0+ #40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS:  00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __die+0x1f/0x60
 ? page_fault_oops+0x14e/0x3d0
 ? exc_page_fault+0x73/0x130
 ? asm_exc_page_fault+0x22/0x30
 ? mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
 netdev_nl_stats_by_netdev+0x2a6/0x4c0
 ? __rmqueue_pcplist+0x351/0x6f0
 netdev_nl_qstats_get_dumpit+0xc4/0x1b0
 genl_dumpit+0x2d/0x80
 netlink_dump+0x199/0x410
 __netlink_dump_start+0x1aa/0x2c0
 genl_family_rcv_msg_dumpit+0x94/0xf0
 ? __pfx_genl_start+0x10/0x10
 ? __pfx_genl_dumpit+0x10/0x10
 ? __pfx_genl_done+0x10/0x10
 genl_rcv_msg+0x116/0x2b0
 ? __pfx_netdev_nl_qstats_get_dumpit+0x10/0x10
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x54/0x100
 genl_rcv+0x24/0x40
 netlink_unicast+0x21a/0x340
 netlink_sendmsg+0x1f4/0x440
 __sys_sendto+0x1b6/0x1c0
 ? do_sock_setsockopt+0xc3/0x180
 ? __sys_setsockopt+0x60/0xb0
 __x64_sys_sendto+0x20/0x30
 do_syscall_64+0x50/0x110
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f43757132b0
Code: c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 41 54 48 83 ec 20
RSP: 002b:00007ffd258da048 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffd258da0f8 RCX: 00007f43757132b0
RDX: 000000000000001c RSI: 00007f437464b850 RDI: 0000000000000003
RBP: 00007f4375085de0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f43751a6147
 </TASK>
Modules linked in: netconsole xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse [last unloaded: netconsole]
CR2: 0000000000000c00
---[ end trace 0000000000000000 ]---
RIP: 0010:mlx5e_get_queue_stats_rx+0x3c/0xb0 [mlx5_core]
Code: fc 55 48 63 ee 53 48 89 d3 e8 40 3d 70 e1 85 c0 74 58 4c 89 ef e8 d4 07 04 00 84 c0 75 41 49 8b 84 24 f8 39 00 00 48 8b 04 e8 <48> 8b 90 00 0c 00 00 48 03 90 40 0a 00 00 48 89 53 08 48 8b 90 08
RSP: 0018:ffff888116be37d0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888116be3868 RCX: 0000000000000004
RDX: ffff88810ada4000 RSI: 0000000000000000 RDI: ffff888109df09c0
RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000004
R10: ffff88813461901c R11: ffffffffffffffff R12: ffff888109df0000
R13: ffff888109df09c0 R14: ffff888116be38d0 R15: 0000000000000000
FS:  00007f4375d5c740(0000) GS:ffff88852c980000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000c00 CR3: 0000000106ada006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 7b66ae5 ("net/mlx5e: Add per queue netdev-genl stats")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240808144107.2095424-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
• kbuild Related to building the kernel, `make`, `Kbuild`, `Kconfig` options...
Development

No branches or pull requests

5 participants