Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New subflows send MP_JOIN to port 0 #63

Closed
matttbe opened this issue Jul 22, 2020 · 2 comments
Closed

New subflows send MP_JOIN to port 0 #63

matttbe opened this issue Jul 22, 2020 · 2 comments
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Jul 22, 2020

More info are coming soon but we can see in different setup that MP_JOIN are sent to port 0:

12:02:25.985960 IP 10.0.1.2.8000 > 10.0.1.1.59574: Flags [.], ack 78, win 509, options [nop,nop,TS val 58365988 ecr 260327143,mptcp add-addr[bad opt]>
12:02:25.986311 IP 10.0.2.1.46531 > 10.0.2.2.0: Flags [S], seq 1204877313, win 64240, options [mss 1460,sackOK,TS val 2027756336 ecr 0,nop,wscale 7,mptcp join backup id 0 token 0xc777f628 nonce
0xacb9a776], length 0
12:02:25.986322 IP 10.0.2.2.0 > 10.0.2.1.46531: Flags [R.], seq 0, ack 1204877314, win 0, length 0

Or even with packetdrill:

root@(none):/opt/packetdrill/gtests/net/mptcp/mp_join# tcpdump -i any -n -c 20 tcp &
[1] 260
root@(none):/opt/packetdrill/gtests/net/mptcp/mp_join# tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes

root@(none):/opt/packetdrill/gtests/net/mptcp/mp_join# ../../packetdrill/packetdrill -vvv mp_join_client.pkt 
socket syscall: 1594826109.716840
setsockopt syscall: 1594826109.719996
fcntl syscall: 1594826109.723111
fcntl syscall: 1594826109.725905
connect syscall: 1594826109.729016
outbound sniffed packet:  0.089384 S 3824942547:3824942547(0) win 65535 <mss 1460,sackOK,TS val 978433635 ecr 0,nop,wscale 8,mp_capable v1 flags: |H| >
inbound injected packet:  0.102111 S. 0:0(0) ack 3824942548 win 65535 <mss 1460,sackOK,TS val 4074410674 ecr 978433635,nop,wscale 8,mp_capable v1 flags: |H| sender_key: 2>
outbound sniffed packet:  0.112000 . 3824942548:3824942548(0) ack 1 win 256 <nop,nop,TS val 978433658 ecr 4074410674,mp_capable v1 flags: |H| sender_key: 15724967926438798442 receiver_key: 2>
15:15:09.728921 IP 192.168.228.105.47794 > 192.0.2.1.8080: Flags [S], seq 3824942547, win 65535, options [mss 1460,sackOK,TS val 978433635 ecr 0,nop,wscale 8,mptcp capable[bad opt]>
15:15:09.751397 IP 192.0.2.1.8080 > 192.168.228.105.47794: Flags [S.], seq 0, ack 3824942548, win 65535, options [mss 1460,sackOK,TS val 4074410674 ecr 978433635,nop,wscale 8,mptcp capable Unknown Version (1)], length 0
15:15:09.751537 IP 192.168.228.105.47794 > 192.0.2.1.8080: Flags [.], ack 1, win 256, options [nop,nop,TS val 978433658 ecr 4074410674,mptcp capable Unknown Version (1)], length 0
getsockopt syscall: 1594826109.963115
fcntl syscall: 1594826109.966067
15:15:10.968919 IP 192.168.228.105.47794 > 192.0.2.1.8080: Flags [P.], seq 1:3, ack 1, win 256, options [nop,nop,TS val 978434875 ecr 4074410674,mptcp capable[bad opt]>
write syscall: 1594826110.979351
outbound sniffed packet:  1.329382 P. 3824942548:3824942550(2) ack 1 win 256 <nop,nop,TS val 978434875 ecr 4074410674,mp_capable v1 flags: |H| sender_key: 15724967926438798442 receiver_key: 2 mpcdatalen=2,nop,nop>
inbound injected packet:  1.356829 . 1:1(0) ack 3824942550 win 256 <nop,nop,TS val 4074418293 ecr 978434875,add_address address_id: 1 ipv4: 192.0.2.2 hmac: 18175360766677029581,dss dack8 9168515192191584501 flags: Aa>
15:15:11.009981 IP 192.0.2.1.8080 > 192.168.228.105.47794: Flags [.], ack 3, win 256, options [nop,nop,TS val 4074418293 ecr 978434875,mptcp add-addr[bad opt]>
15:15:11.019680 IP 192.168.228.105.35147 > 192.0.2.2.0: Flags [S], seq 2261540031, win 65535, options [mss 1460,sackOK,TS val 3362971286 ecr 0,nop,wscale 8,mptcp join backup id 0 token 0xd86e8112 nonce 0x91218654], length 0
15:15:12.059316 IP 192.168.228.105.35147 > 192.0.2.2.0: Flags [S], seq 2261540031, win 65535, options [mss 1460,sackOK,TS val 3362972326 ecr 0,nop,wscale 8,mptcp join backup id 0 token 0xd86e8112 nonce 0x91218654], length 0
15:15:14.107395 IP 192.168.228.105.35147 > 192.0.2.2.0: Flags [S], seq 2261540031, win 65535, options [mss 1460,sackOK,TS val 3362974374 ecr 0,nop,wscale 8,mptcp join backup id 0 token 0xd86e8112 nonce 0x91218654], length 0
15:15:18.139331 IP 192.168.228.105.35147 > 192.0.2.2.0: Flags [S], seq 2261540031, win 65535, options [mss 1460,sackOK,TS val 3362978406 ecr 0,nop,wscale 8,mptcp join backup id 0 token 0xd86e8112 nonce 0x91218654], length 0

(Note: tcpdump version is old, not support MPTCPv1)

@matttbe matttbe added the bug label Jul 22, 2020
@nrybowski
Copy link
Member

Here is a minimal setup to reproduce this bug (mptcp-tools is supposed to be in the same folder than this script and the use_mptcp tool compiled) :

#! /bin/bash

setup_iface() {
    ns="$4"
    ns_exec="ip netns exec $ns"

    ip l set "veth$1" netns "$ns"
    $ns_exec ip l set dev veth"$1" up  
    $ns_exec ip a add dev veth"$1" 10.0."$2"."$3"/24
}

cgroups=("client" "server")
use_mptcp="./mptcp-tools/use_mptcp/use_mptcp.sh"

ip l add veth1 type veth peer name veth2
ip l add veth3 type veth peer name veth4

i=1
for cgroup in ${cgroups[@]}
do
    ns_name=ns_$cgroup
    ns_exec="ip netns exec $ns_name"
    ip netns list | grep $ns_name > /dev/null
    if [ $? -eq 1 ]
    then
        ip netns add $ns_name 

        setup_iface "$i" "1" "$i" "$ns_name" 
        setup_iface "$((i+2))" "2" "$i" "$ns_name"
        $ns_exec ip mptcp endpoint flush
        $ns_exec ip mptcp limits set add_addr_accepted 2 subflows 2
    fi

    if [ "${cgroup}" = "server" ]
    then
            addrs=$($ns_exec ip a | grep inet | sed -e 's/inet[6]*//g' -e 's/fe80.*$//g' -e 's/\/24.*$//g')
            echo "${addrs[@]}"
     	    #for addr in ${addrs[@]}
            #do
            #   ${ns_exec} ip mptcp endpoint add ${addr} signal
            #done
            ${ns_exec} ip mptcp endpoint add ${addrs[0]} signal
            ${ns_exec} ${use_mptcp} python3 -m http.server &
            #${ns_exec} tc qdisc add dev veth$i root netem delay 1000ms 
    fi

    ((i++))

done

In shell 1 : ip netns exec ns_server tcpdump -ni any tcp and in shell 2 : ip netns exec ns_client mptcp-tools/use_mptcp/use_mptcp.sh curl 10.0.1.2:8000 -o /dev/null.

From the tcpdump :

[...]
14:08:32.077470 IP 10.0.1.2.8000 > 10.0.1.1.41760: Flags [.], ack 78, win 509, options [nop,nop,TS val 1413864222 ecr 3687490168,mptcp add-addr[bad opt]>
14:08:32.077841 IP 10.0.2.1.43719 > 10.0.2.2.0: Flags [S], seq 2048241466, win 64240, options [mss 1460,sackOK,TS val 2245203070 ecr 0,nop,wscale 7,mptcp join backup id 0 token 0xe131e65a nonce
0x763548b7], length 0
14:08:32.077850 IP 10.0.2.2.0 > 10.0.2.1.43719: Flags [R.], seq 0, ack 2048241467, win 0, length 0
[...]

I'm not sure if add signal has to be called on both the addresses of ns_server but when I tried (the for loop in the above script) I got the same bug but on the first interface :

[...]
14:11:47.055974 IP 10.0.1.2.8000 > 10.0.1.1.41762: Flags [.], ack 78, win 509, options [nop,nop,TS val 1414059201 ecr 3687685147,mptcp add-addr[bad opt]>
14:11:47.056372 IP 10.0.1.1.41485 > 10.0.1.2.0: Flags [S], seq 3336345692, win 64240, options [mss 1460,sackOK,TS val 3687685147 ecr 0,nop,wscale 7,mptcp join backup id 0 token 0x27143ab nonce 0xf3a22c5c], length 0
14:11:47.056381 IP 10.0.1.2.0 > 10.0.1.1.41485: Flags [R.], seq 0, ack 3336345693, win 0, length 0
[...]

Tested on commit eeb8340.

@matttbe
Copy link
Member Author

matttbe commented Jul 27, 2020

Arf, I forgot to add "Closes #63" in the commit message of my last patch.

This is fixed in the export branch. It has been sent to netdev for -net branch.

@matttbe matttbe closed this as completed Jul 27, 2020
geliangtang pushed a commit to geliangtang/mptcp_net-next that referenced this issue Jun 26, 2021
Chipidea also need sync interrupt before unbind the udc while
gadget remove driver, otherwise setup irq handling may happen
while unbind, see below dump generated from android function
switch stress test:

[ 4703.503056] android_work: sent uevent USB_STATE=CONNECTED
[ 4703.514642] android_work: sent uevent USB_STATE=DISCONNECTED
[ 4703.651339] android_work: sent uevent USB_STATE=CONNECTED
[ 4703.661806] init: Control message: Processed ctl.stop for 'adbd' from pid: 561 (system_server)
[ 4703.673469] init: processing action (init.svc.adbd=stopped) from (/system/etc/init/hw/init.usb.configfs.rc:14)
[ 4703.676451] Unable to handle kernel read from unreadable memory at virtual address 0000000000000090
[ 4703.676454] Mem abort info:
[ 4703.676458]   ESR = 0x96000004
[ 4703.676461]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 4703.676464]   SET = 0, FnV = 0
[ 4703.676466]   EA = 0, S1PTW = 0
[ 4703.676468] Data abort info:
[ 4703.676471]   ISV = 0, ISS = 0x00000004
[ 4703.676473]   CM = 0, WnR = 0
[ 4703.676478] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004a867000
[ 4703.676481] [0000000000000090] pgd=0000000000000000, p4d=0000000000000000
[ 4703.676503] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 4703.758297] Modules linked in: synaptics_dsx_i2c moal(O) mlan(O)
[ 4703.764327] CPU: 0 PID: 235 Comm: lmkd Tainted: G        W  O      5.10.9-00001-g3f5fd8487c38-dirty multipath-tcp#63
[ 4703.773720] Hardware name: NXP i.MX8MNano EVK board (DT)
[ 4703.779033] pstate: 60400085 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
[ 4703.785046] pc : _raw_write_unlock_bh+0xc0/0x2c8
[ 4703.789667] lr : android_setup+0x4c/0x168
[ 4703.793676] sp : ffff80001256bd80
[ 4703.796989] x29: ffff80001256bd80 x28: 00000000000000a8
[ 4703.802304] x27: ffff800012470000 x26: ffff80006d923000
[ 4703.807616] x25: ffff800012471000 x24: ffff00000b091140
[ 4703.812929] x23: ffff0000077dbd38 x22: ffff0000077da490
[ 4703.818242] x21: ffff80001256be30 x20: 0000000000000000
[ 4703.823554] x19: 0000000000000080 x18: ffff800012561048
[ 4703.828867] x17: 0000000000000000 x16: 0000000000000039
[ 4703.834180] x15: ffff8000106ad258 x14: ffff80001194c277
[ 4703.839493] x13: 0000000000003934 x12: 0000000000000000
[ 4703.844805] x11: 0000000000000000 x10: 0000000000000001
[ 4703.850117] x9 : 0000000000000000 x8 : 0000000000000090
[ 4703.855429] x7 : 6f72646e61203a70 x6 : ffff8000124f2450
[ 4703.860742] x5 : ffffffffffffffff x4 : 0000000000000009
[ 4703.866054] x3 : ffff8000108a290c x2 : ffff00007fb3a9c8
[ 4703.871367] x1 : 0000000000000000 x0 : 0000000000000090
[ 4703.876681] Call trace:
[ 4703.879129]  _raw_write_unlock_bh+0xc0/0x2c8
[ 4703.883397]  android_setup+0x4c/0x168
[ 4703.887059]  udc_irq+0x824/0xa9c
[ 4703.890287]  ci_irq+0x124/0x148
[ 4703.893429]  __handle_irq_event_percpu+0x84/0x268
[ 4703.898131]  handle_irq_event+0x64/0x14c
[ 4703.902054]  handle_fasteoi_irq+0x110/0x210
[ 4703.906236]  __handle_domain_irq+0x8c/0xd4
[ 4703.910332]  gic_handle_irq+0x6c/0x124
[ 4703.914081]  el1_irq+0xdc/0x1c0
[ 4703.917221]  _raw_spin_unlock_irq+0x20/0x54
[ 4703.921405]  finish_task_switch+0x84/0x224
[ 4703.925502]  __schedule+0x4a4/0x734
[ 4703.928990]  schedule+0xa0/0xe8
[ 4703.932132]  do_notify_resume+0x150/0x184
[ 4703.936140]  work_pending+0xc/0x40c
[ 4703.939633] Code: d5384613 521b0a69 d5184609 f9800111 (885ffd01)
[ 4703.945732] ---[ end trace ba5c1875ae49d53c ]---
[ 4703.950350] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[ 4703.957223] SMP: stopping secondary CPUs
[ 4703.961151] Kernel Offset: disabled
[ 4703.964638] CPU features: 0x0240002,2000200c
[ 4703.968905] Memory Limit: none
[ 4703.971963] Rebooting in 5 seconds..

Tested-by: faqiang.zhu <faqiang.zhu@nxp.com>
Signed-off-by: Li Jun <jun.li@nxp.com>
Link: https://lore.kernel.org/r/1620989984-7653-1-git-send-email-jun.li@nxp.com
Signed-off-by: Peter Chen <peter.chen@kernel.org>
dcaratti pushed a commit to dcaratti/mptcp_net-next that referenced this issue Sep 2, 2021
jenkins-tessares pushed a commit that referenced this issue Nov 3, 2021
…together

Running endpoint security solutions like Sentinel1 that use perf-based
tracing heavily lead to this repeated dump complaining about dockerd.
The default value of 2048 is nowhere near not large enough.

Using the prior patch "tracing: show size of requested buffer", we get
"perf buffer not large enough, wanted 6644, have 6144", after repeated
up-sizing (I did 2/4/6/8K). With 8K, the problem doesn't occur at all,
so below is the trace for 6K.

I'm wondering if this value should be selectable at boot time, but this
is a good starting point.

```
------------[ cut here ]------------
perf buffer not large enough, wanted 6644, have 6144
WARNING: CPU: 1 PID: 4997 at kernel/trace/trace_event_perf.c:402 perf_trace_buf_alloc+0x8c/0xa0
Modules linked in: [..]
CPU: 1 PID: 4997 Comm: sh Tainted: G                T 5.13.13-x86_64-00039-gb3959163488e #63
Hardware name: LENOVO 20KH002JUS/20KH002JUS, BIOS N23ET66W (1.41 ) 09/02/2019
RIP: 0010:perf_trace_buf_alloc+0x8c/0xa0
Code: 80 3d 43 97 d0 01 00 74 07 31 c0 5b 5d 41 5c c3 ba 00 18 00 00 89 ee 48 c7 c7 00 82 7d 91 c6 05 25 97 d0 01 01 e8 22 ee bc 00 <0f> 0b 31 c0 eb db 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 89
RSP: 0018:ffffb922026b7d58 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9da5ee012000 RCX: 0000000000000027
RDX: ffff9da881657828 RSI: 0000000000000001 RDI: ffff9da881657820
RBP: 00000000000019f4 R08: 0000000000000000 R09: ffffb922026b7b80
R10: ffffb922026b7b78 R11: ffffffff91dda688 R12: 000000000000000f
R13: ffff9da5ee012108 R14: ffff9da8816570a0 R15: ffffb922026b7e30
FS:  00007f420db1a080(0000) GS:ffff9da881640000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 00000002504a8006 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 kprobe_perf_func+0x11e/0x270
 ? do_execveat_common.isra.0+0x1/0x1c0
 ? do_execveat_common.isra.0+0x5/0x1c0
 kprobe_ftrace_handler+0x10e/0x1d0
 0xffffffffc03aa0c8
 ? do_execveat_common.isra.0+0x1/0x1c0
 do_execveat_common.isra.0+0x5/0x1c0
 __x64_sys_execve+0x33/0x40
 do_syscall_64+0x6b/0xc0
 ? do_syscall_64+0x11/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f420dc1db37
Code: ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc 0f 1f 84 00 00 00 00 00 b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 43 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd4e8b4e38 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f420dc1db37
RDX: 0000564338d1e740 RSI: 0000564338d32d50 RDI: 0000564338d28f00
RBP: 0000564338d28f00 R08: 0000564338d32d50 R09: 0000000000000020
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000564338d28f00
R13: 0000564338d32d50 R14: 0000564338d1e740 R15: 0000564338d28c60
---[ end trace 83ab3e8e16275e49 ]---
```

Link: https://lkml.kernel.org/r/20210831043723.13481-2-robbat2@gentoo.org

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
jenkins-tessares pushed a commit that referenced this issue Dec 24, 2021
Large pkt_len can lead to out-out-bound memcpy. Current
ath9k_hif_usb_rx_stream allows combining the content of two urb
inputs to one pkt. The first input can indicate the size of the
pkt. Any remaining size is saved in hif_dev->rx_remain_len.
While processing the next input, memcpy is used with rx_remain_len.

4-byte pkt_len can go up to 0xffff, while a single input is 0x4000
maximum in size (MAX_RX_BUF_SIZE). Thus, the patch adds a check for
pkt_len which must not exceed 2 * MAX_RX_BUG_SIZE.

BUG: KASAN: slab-out-of-bounds in ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc]
Read of size 46393 at addr ffff888018798000 by task kworker/0:1/23

CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 5.6.0 #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
Workqueue: events request_firmware_work_func
Call Trace:
 <IRQ>
 dump_stack+0x76/0xa0
 print_address_description.constprop.0+0x16/0x200
 ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc]
 ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc]
 __kasan_report.cold+0x37/0x7c
 ? ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc]
 kasan_report+0xe/0x20
 check_memory_region+0x15a/0x1d0
 memcpy+0x20/0x50
 ath9k_hif_usb_rx_cb+0x490/0xed7 [ath9k_htc]
 ? hif_usb_mgmt_cb+0x2d9/0x2d9 [ath9k_htc]
 ? _raw_spin_lock_irqsave+0x7b/0xd0
 ? _raw_spin_trylock_bh+0x120/0x120
 ? __usb_unanchor_urb+0x12f/0x210
 __usb_hcd_giveback_urb+0x1e4/0x380
 usb_giveback_urb_bh+0x241/0x4f0
 ? __hrtimer_run_queues+0x316/0x740
 ? __usb_hcd_giveback_urb+0x380/0x380
 tasklet_action_common.isra.0+0x135/0x330
 __do_softirq+0x18c/0x634
 irq_exit+0x114/0x140
 smp_apic_timer_interrupt+0xde/0x380
 apic_timer_interrupt+0xf/0x20

I found the bug using a custome USBFuzz port. It's a research work
to fuzz USB stack/drivers. I modified it to fuzz ath9k driver only,
providing hand-crafted usb descriptors to QEMU.

After fixing the value of pkt_tag to ATH_USB_RX_STREAM_MODE_TAG in QEMU
emulation, I found the KASAN report. The bug is triggerable whenever
pkt_len is above two MAX_RX_BUG_SIZE. I used the same input that crashes
to test the driver works when applying the patch.

Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/YXsidrRuK6zBJicZ@10-18-43-117.dynapool.wireless.nyu.edu
jenkins-tessares pushed a commit that referenced this issue Nov 23, 2022
If a socket bound to a wildcard address fails to connect(), we
only reset saddr and keep the port.  Then, we have to fix up the
bhash2 bucket; otherwise, the bucket has an inconsistent address
in the list.

Also, listen() for such a socket will fire the WARN_ON() in
inet_csk_get_port(). [0]

Note that when a system runs out of memory, we give up fixing the
bucket and unlink sk from bhash and bhash2 by inet_put_port().

[0]:
WARNING: CPU: 0 PID: 207 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
Modules linked in:
CPU: 0 PID: 207 Comm: bhash2_prev_rep Not tainted 6.1.0-rc3-00799-gc8421681c845 #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
RIP: 0010:inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
Code: 74 a7 eb 93 48 8b 54 24 18 0f b7 cb 4c 89 e6 4c 89 ff e8 48 b2 ff ff 49 8b 87 18 04 00 00 e9 32 ff ff ff 0f 0b e9 34 ff ff ff <0f> 0b e9 42 ff ff ff 41 8b 7f 50 41 8b 4f 54 89 fe 81 f6 00 00 ff
RSP: 0018:ffffc900003d7e50 EFLAGS: 00010202
RAX: ffff8881047fb500 RBX: 0000000000004e20 RCX: 0000000000000000
RDX: 000000000000000a RSI: 00000000fffffe00 RDI: 00000000ffffffff
RBP: ffffffff8324dc00 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000004e20 R15: ffff8881054e1280
FS:  00007f8ac04dc740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001540 CR3: 00000001055fa003 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
 inet_listen (net/ipv4/af_inet.c:228)
 __sys_listen (net/socket.c:1810)
 __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f8ac051de5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc1c177248 EFLAGS: 00000206 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 0000000020001550 RCX: 00007f8ac051de5d
RDX: ffffffffffffff80 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007ffc1c177270 R08: 0000000000000018 R09: 0000000000000007
R10: 0000000020001540 R11: 0000000000000206 R12: 00007ffc1c177388
R13: 0000000000401169 R14: 0000000000403e18 R15: 00007f8ac0723000
 </TASK>

Fixes: 28044fc ("net: Add a bhash2 table hashed by port and address")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
jenkins-tessares pushed a commit that referenced this issue Nov 30, 2022
The coreboot_table driver registers a coreboot bus while probing a
"coreboot_table" device representing the coreboot table memory region.
Probing this device (i.e., registering the bus) is a dependency for the
module_init() functions of any driver for this bus (e.g.,
memconsole-coreboot.c / memconsole_driver_init()).

With synchronous probe, this dependency works OK, as the link order in
the Makefile ensures coreboot_table_driver_init() (and thus,
coreboot_table_probe()) completes before a coreboot device driver tries
to add itself to the bus.

With asynchronous probe, however, coreboot_table_probe() may race with
memconsole_driver_init(), and so we're liable to hit one of these two:

1. coreboot_driver_register() eventually hits "[...] the bus was not
   initialized.", and the memconsole driver fails to register; or
2. coreboot_driver_register() gets past #1, but still races with
   bus_register() and hits some other undefined/crashing behavior (e.g.,
   in driver_find() [1])

We can resolve this by registering the bus in our initcall, and only
deferring "device" work (scanning the coreboot memory region and
creating sub-devices) to probe().

[1] Example failure, using 'driver_async_probe=*' kernel command line:

[    0.114217] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[    0.114307] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1 #63
[    0.114316] Hardware name: Google Scarlet (DT)
...
[    0.114488] Call trace:
[    0.114494]  _raw_spin_lock+0x34/0x60
[    0.114502]  kset_find_obj+0x28/0x84
[    0.114511]  driver_find+0x30/0x50
[    0.114520]  driver_register+0x64/0x10c
[    0.114528]  coreboot_driver_register+0x30/0x3c
[    0.114540]  memconsole_driver_init+0x24/0x30
[    0.114550]  do_one_initcall+0x154/0x2e0
[    0.114560]  do_initcall_level+0x134/0x160
[    0.114571]  do_initcalls+0x60/0xa0
[    0.114579]  do_basic_setup+0x28/0x34
[    0.114588]  kernel_init_freeable+0xf8/0x150
[    0.114596]  kernel_init+0x2c/0x12c
[    0.114607]  ret_from_fork+0x10/0x20
[    0.114624] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
[    0.114631] ---[ end trace 0000000000000000 ]---

Fixes: b81e314 ("firmware: coreboot: Make bus registration symmetric")
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20221019180934.1.If29e167d8a4771b0bf4a39c89c6946ed764817b9@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants