TFO: IPv6: the generated cookie is not the expected one #317

matttbe · 2022-11-24T15:18:33Z

Packetdrill is reporting that the cookie generated by the kernel is not the expected one when testing in IPv6.

After a bit of debugging, I noticed req->rsk_ops->family is set to AF_INET (not 6) because the subflows extends the request_sock_ops of TCP v4 (tcp_request_sock_ops).

Validation in progress but here is a ticket to start discussions (and not to forget to talk about that at the next weekly meeting ;) )

The text was updated successfully, but these errors were encountered:

matttbe · 2022-11-24T16:23:13Z

The Packetdrill test I was using:

// Receive data with TFO + cookie
--tolerance_usecs=100000
`../common/defaults.sh`

// A first connection to store the cookie
 0.0    socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0.0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.0    getsockopt(3, SOL_TCP, TCP_FASTOPEN, [0], [4]) = 0
+0.0    setsockopt(3, SOL_TCP, TCP_FASTOPEN, [2], 4) = 0
+0.0    getsockopt(3, SOL_TCP, TCP_FASTOPEN, [2], [4]) = 0

+0.0    bind(3, ..., ...) = 0
+0.0    listen(3, 1) = 0

+0.0      <  S   0:0(0)                win 65535  <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, FO,            nop, nop, mpcapable v1 flags[flag_h] nokey>
+0.01     >  S.  0:0(0)      ack 1                <mss 1460, nop, nop, sackOK,           nop, wscale 8, FO TFO_COOKIE, nop, nop, mpcapable v1 flags[flag_h] key[skey]>
+0.01     <   .  1:1(0)      ack 1     win 450    <                                                                              mpcapable v1 flags[flag_h] key[ckey=2, skey]>

+0.2    accept(3, ..., ...) = 4
+0.2    close(4) = 0
+0.0      >   .  1:1(0)      ack 1                <dss dack4=1 dsn8=1 ssn=0 dll=1 nocs fin, nop, nop>
+0.1      <   .  1:1(0)      ack 1     win 450    <dss dack4=2 dsn4=1 ssn=0 dll=1 nocs fin, nop, nop>
+0.0      >   .  1:1(0)      ack 1                <dss dack4=2 nocs>
+0.0      >  F.  1:1(0)      ack 1                <dss dack4=2 nocs>
+0.0      <   .  1:1(0)      ack 2     win 450    <dss dack4=2 nocs>

// reply with a small delay to let the kernel switching to a time-wait socket.
+0.4      <  F.  1:1(0)      ack 2     win 450    <dss dack4=2 nocs>
+0.0      >   .  2:2(0)      ack 2
+0.0      <  R.  2:2(0)      ack 2     win 450


// Not supported by Packetdrill yet: new connection
// Another Fastopen request, now SYN will have data
+0.01     <  S   0:500(500)            win 65535  <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, FO TFO_COOKIE, nop, nop, mpcapable v1 flags[flag_h] nokey>
// +0.01  >  S.  0:0(0)      ack 501              <mss 1460, sackOK, TS val 100 ecr 100, nop, wscale 8,                          mpcapable v1 flags[flag_h] key[skey]>

And the command:

packetdrill -v --ip_version=ipv6 --mtu=1520 --local_ip=fd3d:fa7b:d17d::0 --gateway_ip=fd3d:fa7b:d17d:8888::0 --remote_ip=2001:DB8::1/32 -D CMSG_LEVEL_IP=SOL_IPV6 -D CMSG_TYPE_RECVERR=IP6_RECVERR -D TFO_COOKIE=6aa6ae70c288023b fastopen/server-TCP_FASTOPEN.pkt

matttbe · 2022-11-24T16:24:42Z

WIP kernel code: https://github.com/matttbe/mptcp_net-next/commits/mptcp_subflow_v6_request_sock_ops

matttbe · 2022-11-24T16:29:01Z

Remaining question:

do we need the same slab for v4 and v6 or better with two different ones
(I guess it is not linked with ↑ but we should still allow a mix of subflows in v4 and v6 for the same MPTCP connection)

pabeni · 2022-11-24T21:01:46Z

Remaining question:
* do we need the same slab for v4 and v6 or better with two different ones

note that kmem_cache_create can re-use existing slabs when the arguments matches - in practice tcp_prot and tcpv6_prot should have/use the same slab.

so both copying the slab field or calling twice kmem_cache_create should yeld to the same result

matttbe · 2022-11-25T09:51:05Z

Hi @pabeni ,

Thank you for your feedback!

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).

tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

pabeni · 2022-11-25T10:12:27Z

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).

tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Note that kmem_cache_create can still reuse an existing slab even if asked to created a new one with a different name, if overall argments allow for that, see:

kmem_cache_create -> kmem_cache_create_usercopy -> __kmem_cache_create -> __kmem_cache_alias

the relevant slab name for request socket should be: "request_sock_TCP" "request_sock_TCPv6" see req_prot_init()

Note that you can inspect the existing slabs in /proc/slabinfo - if you are using the slub allocator

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

No, we don't need different slabs for v4/v6 request socket, if the struct size are the same. The slab thing has build-in per-cpu optimization, and that is good enough from resource usage/performance pov

matttbe · 2022-11-25T10:39:44Z

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).
tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Note that kmem_cache_create can still reuse an existing slab even if asked to created a new one with a different name, if overall argments allow for that, see:

kmem_cache_create -> kmem_cache_create_usercopy -> __kmem_cache_create -> __kmem_cache_alias

the relevant slab name for request socket should be: "request_sock_TCP" "request_sock_TCPv6" see req_prot_init()

Note that you can inspect the existing slabs in /proc/slabinfo - if you are using the slub allocator

Thank you for the explanations, it is clearer now, I missed the part about the "merging" bit :-)

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

No, we don't need different slabs for v4/v6 request socket, if the struct size are the same. The slab thing has build-in per-cpu optimization, and that is good enough from resource usage/performance pov

OK thank you for this feedback!

If merges are done (and no new allocations), should I then keep the code "simple" by keeping subflow_ops_init() untouched and calling it twice: once for v4 and once for v6 (with the same name then)?

These tests are similar to the ones for the client side: a first connection is established without DATA in the SYN to request a key, then new connections can send DATA in the SYN with the cookie it previously received. Ideally we would like to validate a second connection in the same test but we are limited by Packetdrill here which cannot understand the new SYN is for a new connection. So this test is split in two parts: - server-TCP_FASTOPEN-cookie-req: the first connection, cookie request - server-TCP_FASTOPEN-cookie-data: the cookie has already been exchanged, data can be sent with the cookie. Note that we can also not use the cookie by setting a sysctl and directly send data, that's what "server-tfo-no-cookie" is doing. Equivalent tests for (plain) TCP are also added. Note that kernel patches are required to get the expected cookie in IPv6, see: multipath-tcp/mptcp_net-next#317 Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>

matttbe · 2022-12-06T09:37:39Z

This has been fixed thanks to:

5819cf2: mptcp: remove MPTCP 'ifdef' in TCP SYN cookies
46b00ed: mptcp: dedicated request sock for subflow in v6
2df1717: mptcp: use proper req destructor for IPv6
Results: f48d240..81092f1 (export-net)
Results: 9324c81..ec528d5 (export)

(I forgot to add the Closes tag in patch 2/3)

We can see that "bswap32: Takes an unsigned 32-bit number in either big- or little-endian format and returns the equivalent number with the same bit width but opposite endianness" in BPF Instruction Set Specification, so it should clear the upper 32 bits in "case 32:" for both BPF_ALU and BPF_ALU64. [root@linux fedora]# echo 1 > /proc/sys/net/core/bpf_jit_enable [root@linux fedora]# modprobe test_bpf Before: test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) After: test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 4 PASS test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 4 PASS Fixes: 4ebf921 ("LoongArch: BPF: Support unconditional bswap instructions") Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the generated insn reverses byte order for both high and low 32-bit words, resuling in an incorrect swap as indicated by the jit test: [ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS [ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times) [ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS [ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS [ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS [ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times) [ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS [ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS Fix this by forcing 32bit variant of rev32. Fixes: 1104247 ("bpf, arm64: Support unconditional bswap") Signed-off-by: Artem Savkov <asavkov@redhat.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Message-ID: <20240321081809.158803-1-asavkov@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

@2

Recent additions in BPF like cpu v4 instructions, test_bpf module exhibits the following failures: test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times) test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times) test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd eBPF filter opcode 00d7 (@2) unsupported jited:0 301 PASS test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 555 PASS test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301 eBPF filter opcode 00d7 (@2) unsupported jited:0 268 PASS test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 eBPF filter opcode 00d7 (@2) unsupported jited:0 269 PASS test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032 eBPF filter opcode 00d7 (@2) unsupported jited:0 460 PASS test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 320 PASS test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe eBPF filter opcode 00d7 (@2) unsupported jited:0 222 PASS test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 eBPF filter opcode 00d7 (@2) unsupported jited:0 273 PASS test_bpf: #344 BPF_LDX_MEMSX | BPF_B eBPF filter opcode 0091 (@5) unsupported jited:0 432 PASS test_bpf: #345 BPF_LDX_MEMSX | BPF_H eBPF filter opcode 0089 (@5) unsupported jited:0 381 PASS test_bpf: #346 BPF_LDX_MEMSX | BPF_W eBPF filter opcode 0081 (@5) unsupported jited:0 505 PASS test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1 eBPF filter opcode 0006 (@1) unsupported jited:0 261 PASS test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed] Fix them by adding missing processing. Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu

matttbe added the bug label Nov 24, 2022

matttbe self-assigned this Nov 24, 2022

matttbe closed this as completed Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFO: IPv6: the generated cookie is not the expected one #317

TFO: IPv6: the generated cookie is not the expected one #317

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

pabeni commented Nov 24, 2022

matttbe commented Nov 25, 2022

pabeni commented Nov 25, 2022

matttbe commented Nov 25, 2022

matttbe commented Dec 6, 2022 •

edited

Loading

TFO: IPv6: the generated cookie is not the expected one #317

TFO: IPv6: the generated cookie is not the expected one #317

Comments

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

matttbe commented Nov 24, 2022

pabeni commented Nov 24, 2022

matttbe commented Nov 25, 2022

pabeni commented Nov 25, 2022

matttbe commented Nov 25, 2022

matttbe commented Dec 6, 2022 • edited Loading

matttbe commented Dec 6, 2022 •

edited

Loading