major redesign, reprising connection as DuplexConnection #102

robertream · 2021-05-09T15:44:13Z

@russelltg
It appears that we have started to step on toes, so I'm submitting this mostly finished draft pull request where I have been exploring how to clean up the tokio I/O loop, which lead me down a path of reintroducing a connection struct that this time encapsulates the Sender and Receiver. I know this is a very large change set, but please consider the benefits from the design direction I am exploring here. I am contradicting my prior advocacy for keeping the sender and receiver algorithms free of communication, concurrent capable. Since we are now just running the whole I/O loop from one thread, concurrency is less important anyhow. I hope you agree that this has significantly simplified the code. I have two different versions of the I/O loop that are alternated somewhat randomly at runtime. One approach uses an Action/Input enum pair and a single function to drive the loop, this is similar to the prior design of the Sender/Receiver enums. The other approach uses a set of command and query methods to drive the loop. I prefer the ergonomics of the the enum approach. It made debug tracing the code trivial too.

One of the larger tests you added recently doesn't compile with all the changes I made, so I commented it out. I don't have the time to go back and fix it, but I can later, I just felt it was important that you take a look at the changes I have made before our paths diverge too far.

I also fixed several bugs that I ran into while testing. For instance, we weren't respecting SND in the sender algorithm, which broke congestion control. The timestamp_rollover test ran much faster than real time with this bug, but with the fix it no longer does. I have set it to ignored for now. We should replace this long running test with a few more focused unit tests. In fact, it might already be mostly covered by the existing TimeStamp unit tests.

I'd like to finish this up and move onto a first pass at metrics, so quick feedback would be appreciated.

…plifying the I/O loop

russelltg · 2021-05-10T14:15:06Z

I'll take a deeper look at this in a few days. In the meantime--

All the srt-protocol integration tests are meant to run much faster than realtime, they never check the wall clock they just simulate time passing--so you're saying that now the algorithm takes a full CPU core to run?

Also, I've been using this test to see how well it can deal with higher-bandwidth connections--I should check it in and just set it to be ignored but for now you can grab it with git restore -s high_bandwidth -- ./srt-tokio/tests/high_bandwidth.rs. It seems to freeze up now.

I suggest you run it with cargo test --test=high_bandwidth -- --nocapture, as it reports statistics. It's not a proper test with pass/fail behavior, it just tries to push a few MB/s through it.

russelltg · 2021-05-10T14:17:11Z

Also nice apparently proptest ICE's the nightly compiler! 🥳 rust-lang/rust#85128

robertream · 2021-05-10T15:21:01Z

I'll take a deeper look at this in a few days. In the meantime--

All the srt-protocol integration tests are meant to run much faster than realtime, they never check the wall clock they just simulate time passing--so you're saying that now the algorithm takes a full CPU core to run?

The problem is that sending data should be limited by the SND timer, this is enables congestion control to function, but the current implementation can potentially run the sender algorithm every time the next "action" is pulled, regardless of the send timer. Sending should only happen when the clock ticks forward far enough to trigger a SND, but this seems to have added a bit more computational cost when running the faster than real time simulation tests. The algorithm shouldn't take a full CPU to run, the tests just run noticeably slower. This could be due to inefficiencies in the test harnesses/simulation code too. I haven't spent much time optimizing yet.

Also, I've been using this test to see how well it can deal with higher-bandwidth connections--I should check it in and just set it to be ignored but for now you can grab it with git restore -s high_bandwidth -- ./srt-tokio/tests/high_bandwidth.rs. It seems to freeze up now.

I suggest you run it with cargo test --test=high_bandwidth -- --nocapture, as it reports statistics. It's not a proper test with pass/fail behavior, it just tries to push a few MB/s through it.

Thanks, that is helpful.

russelltg

Overall this is much cleaner, and I would be happy to merge it with some more work.

srt-protocol/src/protocol/handshake/mod.rs

russelltg · 2021-05-10T16:46:47Z

srt-protocol/src/protocol/receiver/mod.rs

@@ -118,7 +108,7 @@ pub struct Receiver {
    receive_buffer: RecvBuffer,

    /// Shutdown flag. This is set so when the buffer is flushed, it returns Async::Ready(None)
-    shutdown_flag: bool,
+    shutdown_flag: Option<Instant>,


Update comment here--it's no longer a flag--and I'm not sure what particular instant this refers to

Right, this is a shutdown with a timeout. There were many tests that would fail intermittently/randomly due to injected packet loss or race conditions, this was particularly true of the tokio tests. The connect and rendezvous algorithm tests were failing intermittently for this same reason. We need to implement timeouts for them as well.

russelltg · 2021-05-10T16:55:00Z