Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected subflow management behaviour, following endpoint add after subflow failure #242

Closed
pgreenland opened this issue Nov 20, 2021 · 1 comment
Assignees
Labels

Comments

@pgreenland
Copy link

Hi,

Follow on from initial discussion on the mailing list: Original Mailing List Email.

If we add and remove endpoint addresses, everything appears to work well, subflows are connected and disconnected as expected.

However if a subflow fails, for example due to reception of a TCP RST from an upstream network device this seems to cause problems.

The next time an endpoint is removed (a link other than the failing one), the associated subflow is disconnected as expected.

When the endpoint is re-added, more often than not the failed subflow (the one stopped by a TCP RST) it re-connected and the newly added endpoint address is not used.

As suggested I've managed to capture an occurrence of the above, with hopefully enough details to reproduce.

My test setup is as follows:

TestSetup

MPTCP enabled VMs acts as a client and server, connected by a (non MPTCP enabled) VM acting as a router.

The three links between the client and router are implemented via virtual network interfaces, connected between VMs via their own dedicated bridges, mimicking independent networks.

The client and server applications, implemented in python act as simple python TCP client and servers, tricking approximately 2KB of data (from urandom) per second. Both are pretty simple, with the client application blocking on send until it completes. The server listens on port 6666 and clients connect to the server's IP 192.168.2.2.

Client machine config:

ip route
default via 192.168.3.1 dev ens18 metric 1 onlink 
default via 192.168.4.1 dev ens19 metric 2 onlink 
default via 192.168.5.1 dev ens20 metric 3 onlink 
192.168.3.0/30 dev ens18 proto kernel scope link src 192.168.3.2 
192.168.4.0/30 dev ens19 proto kernel scope link src 192.168.4.2 
192.168.5.0/30 dev ens20 proto kernel scope link src 192.168.5.2 

ip rule
0:      from all lookup local
32763:  from 192.168.5.2 lookup 3
32764:  from 192.168.4.2 lookup 2
32765:  from 192.168.3.2 lookup 1
32766:  from all lookup main
32767:  from all lookup default

ip route show table 1
default via 192.168.3.1 dev ens18 
192.168.3.0/30 dev ens18 scope link 

ip route show table 2
default via 192.168.4.1 dev ens19 
192.168.4.0/30 dev ens19 scope link 

ip route show table 3
default via 192.168.5.1 dev ens20 
192.168.5.0/30 dev ens20 scope link

ip mptcp endpoint show
192.168.3.2 id 1 subflow 
192.168.4.2 id 2 subflow 
192.168.5.2 id 3 subflow

Router config (on which packets were captured, network interface names should match in the attached pcap):

ip route
default via 192.168.1.254 dev ens18 
192.168.1.0/24 dev ens18 proto kernel scope link src 192.168.1.190 
192.168.2.0/30 dev ens19 proto kernel scope link src 192.168.2.1 
192.168.3.0/30 dev ens20 proto kernel scope link src 192.168.3.1 
192.168.4.0/30 dev ens21 proto kernel scope link src 192.168.4.1 
192.168.5.0/30 dev ens22 proto kernel scope link src 192.168.5.1

Server machine config:

ip route
default via 192.168.2.1 dev ens18 onlink 
192.168.2.0/30 dev ens18 proto kernel scope link src 192.168.2.2

ip rule
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

ip mptcp endpoint show

<No endpoints configured>

Issue described can be reproduced as follows, with the attached packet capture taken during the session below with associated commands:

mptcp remove and add subflow.pcapng.zip

  1. On client machine

Connect client application and observe that a subflow has been created over each configured endpoint :-).

<Client application connects>

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State        
tcp        0      0 192.168.4.2:46775       192.168.2.2:6666        ESTABLISHED
tcp        0    512 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:54835       192.168.2.2:6666        ESTABLISHED
  1. On server machine

Server machine as expected shows the same subflows.

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.5.2:54835       ESTABLISHED
  1. On client machine

Remove endpoint address and observe associated subflow disconnecting.

ip mptcp endpoint del id 3

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0    128 192.168.4.2:46775       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:54835       192.168.2.2:6666        TIME_WAIT  
  1. On server machine

Subflow has been disconnected here too :-).

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED
  1. On client machine

Add the endpoint back and observer a new subflow being established.

ip mptcp endpoint add 192.168.5.2 id 3 subflow

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0    128 192.168.4.2:46775       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:54835       192.168.2.2:6666        TIME_WAIT  
tcp        0      0 192.168.5.2:38537       192.168.2.2:6666        ESTABLISHED
  1. On server machine

Server now shows full complement of flows again.

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.5.2:38537       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED
  1. On router

Reject traffic being forwarded from adapter on client machine 192.168.5.0/30 network, causing subflow to fail, from the client's point of view.

iptables -t filter -A FORWARD -i ens21 -p TCP -j REJECT --reject-with tcp-reset
  1. On client machine

We see that subflow that was originating from 192.168.4.2 is now gone, following its reception of a RST.

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:54835       192.168.2.2:6666        TIME_WAIT  
tcp        0      0 192.168.5.2:38537       192.168.2.2:6666        ESTABLISHED
  1. On server machine

Server is blissfully unaware of the situation and still shows the subflow as established.

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.5.2:38537       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED
  1. On router

Flush iptables rules, stopping RST transmission.

iptables -t filter -F FORWARD
  1. On client machine

As before remove endpoint address and observe its associated subflow being disconnected.

ip mptcp endpoint del id 3

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0    512 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:38537       192.168.2.2:6666        TIME_WAIT  
  1. On server machine

Server follows along and the subflow disconnects from its point of view.

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED
  1. On client machine

Re-add the endpoint address, expecting a new subflow to be created using it. However instead we see that the new subflow was created using a different endpoint, the previously failed 192.168.4.2. With the newly added 192.168.5.2 endpoint unused.

ip mptcp endpoint add 192.168.5.2 id 3 subflow

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 192.168.3.2:49424       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.4.2:49517       192.168.2.2:6666        ESTABLISHED
tcp        0      0 192.168.5.2:38537       192.168.2.2:6666        TIME_WAIT  
  1. On server machine

The server collaborates the story, showing two subflows originating from 192.168.4.2 (the original failed one and the newly created one), but now flows from 192.168.5.2 (the newly added address).

netstat -nat
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:6666            0.0.0.0:*               LISTEN     
tcp        0      0 192.168.2.2:6666        192.168.4.2:49517       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.4.2:46775       ESTABLISHED
tcp        0      0 192.168.2.2:6666        192.168.3.2:49424       ESTABLISHED

I believe this behaviour originates from the select_local_address function. When triggered as part of a call chain from the netlink endpoint add handler selects the first available address with no subflow. In this case 192.168.4.2 rather than the added 192.168.5.2. Upon successful re-reconnection using the old address, the event handler mptcp_pm_nl_subflow_established looks to be called, which will attempt to trigger the next subflow creation. However this is prevented by the condition local_addr_used < local_addr_max, as local_addr_used will not have been decremented following the forced failure of the original subflow.

Thanks,

Phil

@matttbe
Copy link
Member

matttbe commented Dec 16, 2021

This has been fixed in our export branch thanks to the modifications done by @pabeni:

  • 7dff646: mptcp: fix per socket endpoint accounting

  • b3b3136: mptcp: clean-up MPJ option writing

  • 93ddc71: mptcp: keep track of local endpoint still available foreach msk

  • b3adca8: mptcp: do not block subflows creation on errors

  • 0f4945b: selftests: mptcp: add tests for subflow creation failure

    • 07368dd: selftests: mptcp: apply Mat's comment
  • 1d0e64a: mptcp: cleanup MPJ subflow list handling

  • Results: 8c25dcd..c94ca03

  • bcccc44: mptcp: avoid atomic bit manipulation when possible

  • Results: c85bd12..1fdc9db

@pgreenland do you mind checking this new version? Please re-open this ticket if you still have issues.

@matttbe matttbe closed this as completed Dec 16, 2021
jenkins-tessares pushed a commit that referenced this issue Jul 20, 2023
Add a big batch of test coverage to assert all aspects of the tcx opts
attach, detach and query API:

  # ./vmtest.sh -- ./test_progs -t tc_opts
  [...]
  #238     tc_opts_after:OK
  #239     tc_opts_append:OK
  #240     tc_opts_basic:OK
  #241     tc_opts_before:OK
  #242     tc_opts_chain_classic:OK
  #243     tc_opts_demixed:OK
  #244     tc_opts_detach:OK
  #245     tc_opts_detach_after:OK
  #246     tc_opts_detach_before:OK
  #247     tc_opts_dev_cleanup:OK
  #248     tc_opts_invalid:OK
  #249     tc_opts_mixed:OK
  #250     tc_opts_prepend:OK
  #251     tc_opts_replace:OK
  #252     tc_opts_revision:OK
  Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-8-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
matttbe pushed a commit that referenced this issue Aug 17, 2023
Add several new tcx test cases to improve test coverage. This also includes
a few new tests with ingress instead of clsact qdisc, to cover the fix from
commit dc644b5 ("tcx: Fix splat in ingress_destroy upon tcx_entry_free").

  # ./test_progs -t tc
  [...]
  #234     tc_links_after:OK
  #235     tc_links_append:OK
  #236     tc_links_basic:OK
  #237     tc_links_before:OK
  #238     tc_links_chain_classic:OK
  #239     tc_links_chain_mixed:OK
  #240     tc_links_dev_cleanup:OK
  #241     tc_links_dev_mixed:OK
  #242     tc_links_ingress:OK
  #243     tc_links_invalid:OK
  #244     tc_links_prepend:OK
  #245     tc_links_replace:OK
  #246     tc_links_revision:OK
  #247     tc_opts_after:OK
  #248     tc_opts_append:OK
  #249     tc_opts_basic:OK
  #250     tc_opts_before:OK
  #251     tc_opts_chain_classic:OK
  #252     tc_opts_chain_mixed:OK
  #253     tc_opts_delete_empty:OK
  #254     tc_opts_demixed:OK
  #255     tc_opts_detach:OK
  #256     tc_opts_detach_after:OK
  #257     tc_opts_detach_before:OK
  #258     tc_opts_dev_cleanup:OK
  #259     tc_opts_invalid:OK
  #260     tc_opts_mixed:OK
  #261     tc_opts_prepend:OK
  #262     tc_opts_replace:OK
  #263     tc_opts_revision:OK
  [...]
  Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants