Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webrtc: figure out how to properly close datachannels #575

Closed
marten-seemann opened this issue Sep 12, 2023 · 7 comments
Closed

webrtc: figure out how to properly close datachannels #575

marten-seemann opened this issue Sep 12, 2023 · 7 comments

Comments

@marten-seemann
Copy link
Contributor

Closing WebRTC datachannels is not straightforward. When closing the datachannel, it is instead reset, leading to data sent on that datachannel (potentially) being discarded. See RFC 8831: https://datatracker.ietf.org/doc/html/rfc8831#name-closing-a-data-channel:

Closing of a data channel MUST be signaled by resetting the corresponding outgoing streams [RFC6525]. This means that if one side decides to close the data channel, it resets the corresponding outgoing stream. When the peer sees that an incoming stream was reset, it also resets its corresponding outgoing stream. Once this is completed, the data channel is closed.

This has led to interop failures in go-libp2p, and delayed the release of WebRTC from the go-libp2p v0.31 release. See libp2p/go-libp2p#2337 (comment) for details.

The problem is that if we don't close a datachannel, the underlying WebRTC / SCTP stack will need to continue keeping track of that datachannel / stream, leading to a memory leak, since a WebRTC connection can use up to 64k streams per association. This is a known bug in rust-libp2p: libp2p/rust-libp2p#4333.


How can we solve this?

This is annoying, and yet another time where WebRTC's suboptimal stream state machine is coming to bite us (can't wait for a WebRTC running on top of QUIC!).

First, here's a workaround: Limit the total number of streams to a value lower than 64k, e.g. 500. This is roughly the number of concurrent streams that other stream multiplexers allow, and it allows for some initial interaction between two peers. Important: Once that number is reached, close the underlying WebRTC connection (don't just return an error from the NewStream method). This might lead to some unacknowledged data to be lost, but peers going away in the middle of a transfer is something that application protocols need to deal with anyway.
Obviously, this is not ideal, but it's better than releasing code containing a significant memory leak.

Here's how we could actually solve this: Assuming there's no better way to reliably close a stream, we could add some additional signaling on the libp2p layer. We already use a Protobuf to mirror parts of the QUIC state machine, and this would just be one additional message. Specifically, we'd need to add an acknowledgement for the FIN. Once the FIN-ACK has been received, a peer could actually close the underlying datachannel. We'll need to carefully check how this affects the send and the receive side of the stream.

@achingbrain
Copy link
Member

achingbrain commented Sep 12, 2023

The bufferedAmount datachannel property should be able to be used to prevent data loss? That is, we want to close a channel, if it's bufferedAmount property is non-zero we wait until it is before closing?*

From what I can see Chrome seems to attempt to send all outstanding data on a channel before resetting it (this is not true 🙄), libdatachannel doesn't (it does!), Firefox does, I guess Pion doesn't?


*= There's a gotcha here though - setting bufferAmountThreshold to 0 as a cheap way to be notified when the outgoing message buffer has drained actually means buferredamountlow events don't fire - https://developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel/bufferedamountlow_event

@achingbrain
Copy link
Member

No, after a bit of experimentation it seems even when bufferAmount is 0 data can still be lost, it's not a solution.

@thomaseizinger
Copy link
Contributor

*= There's a gotcha here though - setting bufferAmountThreshold to 0 as a cheap way to be notified when the outgoing message buffer has drained actually means buferredamountlow events don't fire - developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel/bufferedamountlow_event

In our testing, they still fire. Either the docs or the implementations are wrong here!

@achingbrain
Copy link
Member

I think you're right - I've opened a PR against the MDN docs.

@achingbrain
Copy link
Member

So, fun times, I put a browser demo together here.

To change the settings, just edit the code and wait, it'll rebuild & run.

The demo:

  • Creates two local WebRTC peers, connects them, opens a datachannel, sends a bunch of messages and counts the number of messages received
  • If the sender's bufferedAmount gets above a configurable value it waits for the buffer to drain before sending more data
  • Once all the messages have been sent it waits for the bufferedAmount to be 0, then closes the datachannel

It's necessary to not let the bufferedAmount grow too large because there are opaque send queue limits, from testing Chrome has a max value of 256 messages and Firefox has 64k - these are message counts not total byte length so some guesswork is necessary.

Chrome never seems to send the final few messages, even when we wait for it's final .bufferedAmount to become 0 before closing the channel.

Firefox always sends all messages when .bufferedAmount is 0 before closing the channel.

If we don't drain the channel before closing both Chrome & Firefox drop messages (change drainBeforeClose to see).

@achingbrain
Copy link
Member

I've opened an issue against Chromium here: https://bugs.chromium.org/p/chromium/issues/detail?id=1484907

achingbrain added a commit that referenced this issue Sep 27, 2023
Specify closing datachannels by mutual agreement to ensure all data has been received by the remote before closing.

Refs: #575
achingbrain added a commit that referenced this issue Sep 27, 2023
Specify closing datachannels by mutual agreement to ensure all data has been received by the remote before closing.

Refs: #575
@achingbrain
Copy link
Member

I've taken a stab at specifying this in #582 - please take a look and let me know if you spot anything obvious.

It's also being implemented in js-libp2p at libp2p/js-libp2p#2074

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants