Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syzkaller: soft lockup in mptcp_token_exists() #365

Closed
matttbe opened this issue Feb 22, 2023 · 1 comment
Closed

syzkaller: soft lockup in mptcp_token_exists() #365

matttbe opened this issue Feb 22, 2023 · 1 comment

Comments

@matttbe
Copy link
Member

matttbe commented Feb 22, 2023

With this syzkaller reproducer from #347:

# {Threaded:false Repeat:true RepeatTimes:0 Procs:1 Slowdown:1 Sandbox:none SandboxArg:0 Leak:false NetInjection:false NetDevices:true NetReset:true Cgroups:true BinfmtMisc:false CloseFDs:true KCSAN:false DevlinkPCI:false NicVF:false USB:false VhciInjection:false Wifi:false IEEE802154:false Sysctl:false UseTmpDir:true HandleSegv:false Repro:false Trace:false LegacyOptions:{Collide:false Fault:false FaultCall:0 FaultNth:0}}
r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
bind$inet(r0, &(0x7f0000002200)={0x2, 0x4e20, @local}, 0x10)
listen(r0, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
sendto$inet(r1, 0x0, 0x0, 0x2000c000, &(0x7f0000000000)={0x2, 0x4e20, @local}, 0x10)

@cpaasch hit a soft lockup when checking on net and net-next:

[   64.674605] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [syz-executor:3338]
[   64.676438] Modules linked in:
[   64.677168] CPU: 1 PID: 3338 Comm: syz-executor Not tainted 6.2.0-rc8+ #32
[   64.678763] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   64.681215] RIP: 0010:mptcp_token_exists+0xd1/0x160
[   64.682338] Code: 35 a0 5b fe 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 80 00 00 00 48 8b 1b 41 89 df 31 ff 41 83 e7 01 44 89 fe e8 5f 98 5b fe <45> 85 ff 74 8d e8 05 a0 5b fe 48 d1 eb 41 89 ef 44 23 3d 00 3b 12
[   64.686234] RSP: 0018:ffff88811b7094b8 EFLAGS: 00000246
[   64.687355] RAX: 0000000000000000 RBX: 0000000000000ed7 RCX: ffffffff82e0ef41
[   64.688940] RDX: ffff888107841c00 RSI: 0000000000000100 RDI: 0000000000000005
[   64.690639] RBP: 000000008e935705 R08: 0000000000000005 R09: 0000000000000000
[   64.692231] R10: 0000000000000001 R11: 0000000000000000 R12: dffffc0000000000
[   64.693795] R13: ffffed10201a1511 R14: ffff888100d0a878 R15: 0000000000000001
[   64.695455] FS:  00007fde38e36800(0000) GS:ffff88811b700000(0000) knlGS:0000000000000000
[   64.697431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.698852] CR2: 000000000047c550 CR3: 0000000107964000 CR4: 00000000000006e0
[   64.700502] Call Trace:
[   64.701073]  <IRQ>
[   64.701564]  subflow_check_req+0x9d1/0xe70
[   64.702526]  ? __pfx_subflow_check_req+0x10/0x10
[   64.703597]  ? unwind_get_return_address+0x55/0xa0
[   64.704698]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[   64.705929]  ? ip_route_output_flow+0x225/0x2c0
[   64.706974]  ? __pfx_ip_route_output_flow+0x10/0x10
[   64.708086]  ? kmem_cache_alloc+0x177/0x310
[   64.709030]  ? inet_csk_route_req+0x6f4/0x9b0
[   64.710009]  subflow_v4_route_req+0x1f1/0x350
[   64.710991]  tcp_conn_request+0xb29/0x2d40

decode_stacktrace.sh and eventually a bisect will be needed here.

Originally posted by @cpaasch in #347 (comment)

@matttbe
Copy link
Member Author

matttbe commented Feb 22, 2023

As discussed on IRC, I moved the two patches back to -net + mptcp: refactor passive socket initialization:

New patches for t/upstream-net:

  • eb67c35: mptcp: refactor passive socket initialization (net)
  • 315f247: mptcp: use the workqueue to destroy unaccepted sockets (net)
  • 26486b7: mptcp: fix UaF in listener shutdown (net)
  • Results: d9da79d..8de0668 (export-net)

and t/upstream:

  • 40500a9: conflict in t/mptcp-drop-legacy-code
  • 0429d72: conflict in
    t/mptcp-use-the-workqueue-to-destroy-unaccepted-sockets-net-next
  • Results: eb4a580..6d711df (export) # empty, as expected

Tests are now in progress:


@cpaasch the reproducer should then not hit the bug on our export-net now. Do not hesitate to verify and re-open this ticket if not :-)

@matttbe matttbe closed this as completed Feb 22, 2023
matttbe pushed a commit that referenced this issue Apr 19, 2024
The delete set command does not rely on the transaction object for
element removal, therefore, a combination of delete element + delete set
from the abort path could result in restoring twice the refcount of the
mapping.

Check for inactive element in the next generation for the delete element
command in the abort path, skip restoring state if next generation bit
has been already cleared. This is similar to the activate logic using
the set walk iterator.

[ 6170.286929] ------------[ cut here ]------------
[ 6170.286939] WARNING: CPU: 6 PID: 790302 at net/netfilter/nf_tables_api.c:2086 nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287071] Modules linked in: [...]
[ 6170.287633] CPU: 6 PID: 790302 Comm: kworker/6:2 Not tainted 6.9.0-rc3+ #365
[ 6170.287768] RIP: 0010:nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.287886] Code: df 48 8d 7d 58 e8 69 2e 3b df 48 8b 7d 58 e8 80 1b 37 df 48 8d 7d 68 e8 57 2e 3b df 48 8b 7d 68 e8 6e 1b 37 df 48 89 ef eb c4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f
[ 6170.287895] RSP: 0018:ffff888134b8fd08 EFLAGS: 00010202
[ 6170.287904] RAX: 0000000000000001 RBX: ffff888125bffb28 RCX: dffffc0000000000
[ 6170.287912] RDX: 0000000000000003 RSI: ffffffffa20298ab RDI: ffff88811ebe4750
[ 6170.287919] RBP: ffff88811ebe4700 R08: ffff88838e812650 R09: fffffbfff0623a55
[ 6170.287926] R10: ffffffff8311d2af R11: 0000000000000001 R12: ffff888125bffb10
[ 6170.287933] R13: ffff888125bffb10 R14: dead000000000122 R15: dead000000000100
[ 6170.287940] FS:  0000000000000000(0000) GS:ffff888390b00000(0000) knlGS:0000000000000000
[ 6170.287948] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6170.287955] CR2: 00007fd31fc00710 CR3: 0000000133f60004 CR4: 00000000001706f0
[ 6170.287962] Call Trace:
[ 6170.287967]  <TASK>
[ 6170.287973]  ? __warn+0x9f/0x1a0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.287986]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288092]  ? report_bug+0x1b1/0x1e0
[ 6170.288104]  ? handle_bug+0x3c/0x70
[ 6170.288112]  ? exc_invalid_op+0x17/0x40
[ 6170.288120]  ? asm_exc_invalid_op+0x1a/0x20
[ 6170.288132]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288243]  ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables]
[ 6170.288366]  ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables]
[ 6170.288483]  nf_tables_trans_destroy_work+0x588/0x590 [nf_tables]

Fixes: 5910544 ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
matttbe pushed a commit that referenced this issue Apr 26, 2024
…ent path

Check for table dormant flag otherwise netdev release event path tries
to unregister an already unregistered hook.

[524854.857999] ------------[ cut here ]------------
[524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260
[...]
[524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365
[524854.858869] Workqueue: netns cleanup_net
[524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260
[524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41
[524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246
[524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a
[524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438
[524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34
[524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005
[524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00
[524854.858971] FS:  0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[524854.858982] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0
[524854.859000] Call Trace:
[524854.859006]  <TASK>
[524854.859013]  ? __warn+0x9f/0x1a0
[524854.859027]  ? __nf_unregister_net_hook+0x21a/0x260
[524854.859044]  ? report_bug+0x1b1/0x1e0
[524854.859060]  ? handle_bug+0x3c/0x70
[524854.859071]  ? exc_invalid_op+0x17/0x40
[524854.859083]  ? asm_exc_invalid_op+0x1a/0x20
[524854.859100]  ? __nf_unregister_net_hook+0x6a/0x260
[524854.859116]  ? __nf_unregister_net_hook+0x21a/0x260
[524854.859135]  nf_tables_netdev_event+0x337/0x390 [nf_tables]
[524854.859304]  ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859461]  ? packet_notifier+0xb3/0x360
[524854.859476]  ? _raw_spin_unlock_irqrestore+0x11/0x40
[524854.859489]  ? dcbnl_netdevice_event+0x35/0x140
[524854.859507]  ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables]
[524854.859661]  notifier_call_chain+0x7d/0x140
[524854.859677]  unregister_netdevice_many_notify+0x5e1/0xae0

Fixes: d54725c ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant