Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with QUIC version updates #699

Closed
marten-seemann opened this issue Aug 6, 2019 · 4 comments
Closed

Dealing with QUIC version updates #699

marten-seemann opened this issue Aug 6, 2019 · 4 comments

Comments

@marten-seemann
Copy link
Contributor

Overview: Versions in QUIC

Unlike TCP, QUIC has versioning built right into the core protocol. Long Header packets (used during the handshake) carry a version number, whereas Short Header packets (used after the handshake) omit all header fields except for the connection ID.

When a server receives a packet with an unsupported version, it sends a Version Negotiation Packet, which lists all versions that the server supports. The client may then start a new connection attempt using one of those, or abort the connection if there's no overlap in supported versions.

While the IETF QUIC working group is working towards the final QUIC RFC, each draft comes with a new version number. In principle, implementations can support multiple versions at the same time (and quic-go used to do so at some point in the past). Depending on the diff between two draft versions this adds a lot of complexity though.

Question: I haven't found any information specifying how long we guarantee backwards compatibility. Have we made any promises for libp2p downstream users at all?

Options for libp2p

We want to switch to QUIC (see #688) as soon as possible. We need to consider what a QUIC version update means for libp2p users.

No Backwards Compatibility

The easiest option. When establishing a new connection to a peer, we first try dialing QUIC. If the QUIC handshake fails, we fall back to TCP. Unfortunately, this will cost us one round-trip every time we roll out a new quic-go version (if it drops support for previously supported versions).
Furthermore, it discourages being an early adopter: if you're the first peer in the network speaking a new QUIC version, all your handshakes will need to fall back.

Supporting multiple QUIC versions by using multiple quic-go releases

We could run two releases of quic-go in parallel. While each release just handles one (of a few) QUIC versions, together they'd span the range of versions that we want to support.

There are two ways to do this:

  1. Implement a load balancer setup: The load balancer would route packets to the QUIC endpoint that supports the QUIC version needed to process a packet. Since Short Header packets don't carry a version field, the load balancer would need to keep track of connection IDs. Furthermore, since connection IDs can change over the lifetime of a connection, this will either require cooperation of the QUIC endpoints with the load balancer, or a scheme to encode routing information into the connection ID.
  2. Pass (a copy of) each packet to both QUIC endpoints. The endpoint that's not responsible should then silently discard the packet. We need to make sure to filter out Version Negotiation packets, and we would have to disable Stateless Resets.

Happy-Eyeballs-style Connection Racing

We could implement a Happy Eyeballs-style connection establishment: If we know a QUIC and a TCP multiaddr of a peer, we can race two connection attempts (maybe even giving QUIC a headstart of 50ms or so). Whichever handshake finishes first wins, and the client silently kills the other connection.

Racing connections would be a valuable feature even after QUIC becomes more mature. According to Google's measurements (see paragraph 7.2 of their Sigcomm paper), UDP is blocked for ~4% of their users, mostly in entreprise networks. For these users, racing two connections would prevent them from (consciuously) running into the QUIC handshake timeout (which by default happens after 10 seconds).

Since happy-eyeballing tends to mask any connection problems, we would probably want to collect some handshake statistics, containing (at least):

  • The number of QUIC handshakes that time out.
  • The number of QUIC handshakes that fail due to a QUIC version mismatch.
  • The number of QUIC wins over TCP.
@Stebalien
Copy link
Member

Racing connections would be a valuable feature even after QUIC becomes more mature.

I'd like to avoid creating unnecessary TCP connections (takes a file descriptor, can cause issue with connection-tracking firewalls). Instead, we should try to learn which transports work and use/announce those.


Can the QUIC implementation forward connection ID changes to the load balancer?

@marten-seemann
Copy link
Contributor Author

Can the QUIC implementation forward connection ID changes to the load balancer?

That would add quite a lot of additional complexity.

I'd like to avoid creating unnecessary TCP connections (takes a file descriptor, can cause issue with connection-tracking firewalls). Instead, we should try to learn which transports work and use/announce those.

It's strictly better than what we have now: now we'd use the same amount of file descriptors as in the connection-racing case. However, if QUIC wins the race, we quickly (max. a few seconds later) release the file descriptor.
A connection-tracking firewall will see the TCP FIN / RST, so it will know that it doesn't need to track that connection any longer.

Instead, we should try to learn which transports work and use/announce those.

That's a hard problem, and I'm worried that there will be a small percentage of nodes in weird network settings that will experience connectivity problems as soon as we make QUIC the default transport. It's not only about UDP being blocked, middlebox vendors are already developing middleboxes that can selectively block QUIC. And there's no guarantee that the only options are "QUIC works" or "QUIC doesn't work" for any given peer.

@Stebalien
Copy link
Member

It's strictly better than what we have now: now we'd use the same amount of file descriptors as in the connection-racing case. However, if QUIC wins the race, we quickly (max. a few seconds later) release the file descriptor.

It's strictly better but it's still crap. We shouldn't break interop with our default transport every time we upgrade it and handle that by racing with a fallback TCP connection.

That's a hard problem, and I'm worried that there will be a small percentage of nodes in weird network settings that will experience connectivity problems as soon as we make QUIC the default transport. It's not only about UDP being blocked, middlebox vendors are already developing middleboxes that can selectively block QUIC. And there's no guarantee that the only options are "QUIC works" or "QUIC doesn't work" for any given peer.

We can always fallback if necessary. We can use something like (or even just use) our AutoNAT nodes to figure out which protocols appear to work. Then, we can announce all of them. On the dial side, we'd have to try them in some order of preference.

@marten-seemann
Copy link
Contributor Author

This has been resolved, since the multiaddr now encodes the QUIC version: multiformats/multiaddr#145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants