Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFO: IPv6: the generated cookie is not the expected one #317

Closed
matttbe opened this issue Nov 24, 2022 · 8 comments
Closed

TFO: IPv6: the generated cookie is not the expected one #317

matttbe opened this issue Nov 24, 2022 · 8 comments
Assignees
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Nov 24, 2022

Packetdrill is reporting that the cookie generated by the kernel is not the expected one when testing in IPv6.

After a bit of debugging, I noticed req->rsk_ops->family is set to AF_INET (not 6) because the subflows extends the request_sock_ops of TCP v4 (tcp_request_sock_ops).

Validation in progress but here is a ticket to start discussions (and not to forget to talk about that at the next weekly meeting ;) )

@matttbe matttbe added the bug label Nov 24, 2022
@matttbe matttbe self-assigned this Nov 24, 2022
@matttbe
Copy link
Member Author

matttbe commented Nov 24, 2022

The Packetdrill test I was using:

// Receive data with TFO + cookie
--tolerance_usecs=100000
`../common/defaults.sh`

// A first connection to store the cookie
 0.0    socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0.0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.0    getsockopt(3, SOL_TCP, TCP_FASTOPEN, [0], [4]) = 0
+0.0    setsockopt(3, SOL_TCP, TCP_FASTOPEN, [2], 4) = 0
+0.0    getsockopt(3, SOL_TCP, TCP_FASTOPEN, [2], [4]) = 0

+0.0    bind(3, ..., ...) = 0
+0.0    listen(3, 1) = 0

+0.0      <  S   0:0(0)                win 65535  <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, FO,            nop, nop, mpcapable v1 flags[flag_h] nokey>
+0.01     >  S.  0:0(0)      ack 1                <mss 1460, nop, nop, sackOK,           nop, wscale 8, FO TFO_COOKIE, nop, nop, mpcapable v1 flags[flag_h] key[skey]>
+0.01     <   .  1:1(0)      ack 1     win 450    <                                                                              mpcapable v1 flags[flag_h] key[ckey=2, skey]>

+0.2    accept(3, ..., ...) = 4
+0.2    close(4) = 0
+0.0      >   .  1:1(0)      ack 1                <dss dack4=1 dsn8=1 ssn=0 dll=1 nocs fin, nop, nop>
+0.1      <   .  1:1(0)      ack 1     win 450    <dss dack4=2 dsn4=1 ssn=0 dll=1 nocs fin, nop, nop>
+0.0      >   .  1:1(0)      ack 1                <dss dack4=2 nocs>
+0.0      >  F.  1:1(0)      ack 1                <dss dack4=2 nocs>
+0.0      <   .  1:1(0)      ack 2     win 450    <dss dack4=2 nocs>

// reply with a small delay to let the kernel switching to a time-wait socket.
+0.4      <  F.  1:1(0)      ack 2     win 450    <dss dack4=2 nocs>
+0.0      >   .  2:2(0)      ack 2
+0.0      <  R.  2:2(0)      ack 2     win 450


// Not supported by Packetdrill yet: new connection
// Another Fastopen request, now SYN will have data
+0.01     <  S   0:500(500)            win 65535  <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, FO TFO_COOKIE, nop, nop, mpcapable v1 flags[flag_h] nokey>
// +0.01  >  S.  0:0(0)      ack 501              <mss 1460, sackOK, TS val 100 ecr 100, nop, wscale 8,                          mpcapable v1 flags[flag_h] key[skey]>

And the command:

packetdrill -v --ip_version=ipv6 --mtu=1520 --local_ip=fd3d:fa7b:d17d::0 --gateway_ip=fd3d:fa7b:d17d:8888::0 --remote_ip=2001:DB8::1/32 -D CMSG_LEVEL_IP=SOL_IPV6 -D CMSG_TYPE_RECVERR=IP6_RECVERR -D TFO_COOKIE=6aa6ae70c288023b fastopen/server-TCP_FASTOPEN.pkt

@matttbe
Copy link
Member Author

matttbe commented Nov 24, 2022

@matttbe
Copy link
Member Author

matttbe commented Nov 24, 2022

Remaining question:

  • do we need the same slab for v4 and v6 or better with two different ones
  • (I guess it is not linked with ↑ but we should still allow a mix of subflows in v4 and v6 for the same MPTCP connection)

@pabeni
Copy link

pabeni commented Nov 24, 2022

Remaining question:
* do we need the same slab for v4 and v6 or better with two different ones

note that kmem_cache_create can re-use existing slabs when the arguments matches - in practice tcp_prot and tcpv6_prot should have/use the same slab.

so both copying the slab field or calling twice kmem_cache_create should yeld to the same result

@matttbe
Copy link
Member Author

matttbe commented Nov 25, 2022

Hi @pabeni ,

Thank you for your feedback!

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).

tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

@pabeni
Copy link

pabeni commented Nov 25, 2022

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).

tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Note that kmem_cache_create can still reuse an existing slab even if asked to created a new one with a different name, if overall argments allow for that, see:

kmem_cache_create -> kmem_cache_create_usercopy -> __kmem_cache_create -> __kmem_cache_alias

the relevant slab name for request socket should be: "request_sock_TCP" "request_sock_TCPv6" see req_prot_init()

Note that you can inspect the existing slabs in /proc/slabinfo - if you are using the slub allocator

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

No, we don't need different slabs for v4/v6 request socket, if the struct size are the same. The slab thing has build-in per-cpu optimization, and that is good enough from resource usage/performance pov

@matttbe
Copy link
Member Author

matttbe commented Nov 25, 2022

If I'm not mistaken, for the moment, we have slabs for mptcp_prot (proto, requests, TW), mptcp_v6_prot (proto, requests, TW) and mptcp_subflow_request_sock_ops (requests only).
tcp_prot and tcpv6_prot have different names (TCP vs TCPv6) and the name is used to create the different slabs (proto, requests, TW). So similar to MPTCP proto, there are different slabs per family.

Note that kmem_cache_create can still reuse an existing slab even if asked to created a new one with a different name, if overall argments allow for that, see:

kmem_cache_create -> kmem_cache_create_usercopy -> __kmem_cache_create -> __kmem_cache_alias

the relevant slab name for request socket should be: "request_sock_TCP" "request_sock_TCPv6" see req_prot_init()

Note that you can inspect the existing slabs in /proc/slabinfo - if you are using the slub allocator

Thank you for the explanations, it is clearer now, I missed the part about the "merging" bit :-)

Should we then do the same for the subflow requests then? I'm not sure to understand all the consequences (memory / perf) but it might be good to keep them isolated. If yes, do you think it would be OK to change this behaviour in -net?

No, we don't need different slabs for v4/v6 request socket, if the struct size are the same. The slab thing has build-in per-cpu optimization, and that is good enough from resource usage/performance pov

OK thank you for this feedback!

If merges are done (and no new allocations), should I then keep the code "simple" by keeping subflow_ops_init() untouched and calling it twice: once for v4 and once for v6 (with the same name then)?

matttbe added a commit to matttbe/packetdrill that referenced this issue Nov 25, 2022
These tests are similar to the ones for the client side: a first
connection is established without DATA in the SYN to request a key, then
new connections can send DATA in the SYN with the cookie it previously
received.

Ideally we would like to validate a second connection in the same test
but we are limited by Packetdrill here which cannot understand the new
SYN is for a new connection.

So this test is split in two parts:

- server-TCP_FASTOPEN-cookie-req: the first connection, cookie request
- server-TCP_FASTOPEN-cookie-data: the cookie has already been
  exchanged, data can be sent with the cookie.

Note that we can also not use the cookie by setting a sysctl and
directly send data, that's what "server-tfo-no-cookie" is doing.

Equivalent tests for (plain) TCP are also added.

Note that kernel patches are required to get the expected cookie in
IPv6, see:

  multipath-tcp/mptcp_net-next#317

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
matttbe added a commit to matttbe/packetdrill that referenced this issue Nov 25, 2022
These tests are similar to the ones for the client side: a first
connection is established without DATA in the SYN to request a key, then
new connections can send DATA in the SYN with the cookie it previously
received.

Ideally we would like to validate a second connection in the same test
but we are limited by Packetdrill here which cannot understand the new
SYN is for a new connection.

So this test is split in two parts:

- server-TCP_FASTOPEN-cookie-req: the first connection, cookie request
- server-TCP_FASTOPEN-cookie-data: the cookie has already been
  exchanged, data can be sent with the cookie.

Note that we can also not use the cookie by setting a sysctl and
directly send data, that's what "server-tfo-no-cookie" is doing.

Equivalent tests for (plain) TCP are also added.

Note that kernel patches are required to get the expected cookie in
IPv6, see:

  multipath-tcp/mptcp_net-next#317

Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
@matttbe
Copy link
Member Author

matttbe commented Dec 6, 2022

This has been fixed thanks to:

(I forgot to add the Closes tag in patch 2/3)

@matttbe matttbe closed this as completed Dec 6, 2022
matttbe pushed a commit that referenced this issue Dec 15, 2023
We can see that "bswap32: Takes an unsigned 32-bit number in either big-
or little-endian format and returns the equivalent number with the same
bit width but opposite endianness" in BPF Instruction Set Specification,
so it should clear the upper 32 bits in "case 32:" for both BPF_ALU and
BPF_ALU64.

[root@linux fedora]# echo 1 > /proc/sys/net/core/bpf_jit_enable
[root@linux fedora]# modprobe test_bpf

Before:
test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times)
test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times)

After:
test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 4 PASS
test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 4 PASS

Fixes: 4ebf921 ("LoongArch: BPF: Support unconditional bswap instructions")
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
matttbe pushed a commit that referenced this issue Mar 27, 2024
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the
generated insn reverses byte order for both high and low 32-bit words,
resuling in an incorrect swap as indicated by the jit test:

[ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS
[ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times)
[ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS
[ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS
[ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS
[ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times)
[ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS
[ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS

Fix this by forcing 32bit variant of rev32.

Fixes: 1104247 ("bpf, arm64: Support unconditional bswap")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Message-ID: <20240321081809.158803-1-asavkov@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
matttbe pushed a commit that referenced this issue May 20, 2024
Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: #344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: #345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: #346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: #490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants