Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only one HTTP2 Ping is sent with keep alive enabled #2310

Closed
pdcalado opened this issue Oct 24, 2020 · 5 comments · Fixed by differs/Legends#5
Closed

Only one HTTP2 Ping is sent with keep alive enabled #2310

pdcalado opened this issue Oct 24, 2020 · 5 comments · Fixed by differs/Legends#5
Labels
A-http2 Area: HTTP/2 specific. C-bug Category: bug. Something is wrong. This is bad!

Comments

@pdcalado
Copy link
Contributor

I believe there is a bug in the HTTP2 keep alive implementation, more concretely in src/proto/h2/ping.rs.

This issue was originally detected in hyperium/tonic#474, credits to @alce.

It seems that when enabling HTTP2 server keep alive, only the first Ping is sent after the configured interval.
Afterwards, no more Pings seem to be sent.

How to reproduce

I'm applying the following changes to the echo example:

--- a/examples/echo.rs
+++ b/examples/echo.rs
@@ -14,7 +14,10 @@ async fn echo(req: Request<Body>) -> Result<Response<Body>, hyper::Er>
         ))),
 
         // Simply echo the body back to the client.
-        (&Method::POST, "/echo") => Ok(Response::new(req.into_body())),
+        (&Method::POST, "/echo") => {
+           tokio::time::delay_for(std::time::Duration::from_secs(60)).await;
+           Ok(Response::new(req.into_body()))
+       },
 
         // Convert to uppercase before sending back to client using a stream.
         (&Method::POST, "/echo/uppercase") => {
@@ -55,7 +58,10 @@ async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>
 
     let service = make_service_fn(|_| async { Ok::<_, hyper::Error>(service_fn(echo)) }>
 
-    let server = Server::bind(&addr).serve(service);
+    let server = Server::bind(&addr)
+       .http2_keep_alive_interval(std::time::Duration::from_secs(10))
+       .http2_keep_alive_timeout(std::time::Duration::from_secs(5))
+       .serve(service);
 
     println!("Listening on http://{}", addr);

Running the example with cargo run --example echo, sniffing with tcpdump -i lo port 3000, and making a request with curl -v --http2-prior-knowledge http://localhost:3000/echo --data "foo", I get the following:

20:03:21.621105 IP localhost.hbci > localhost.46506: Flags [P.], seq 22:44, ack 160, win 512, options [nop,nop,TS val 2305866698 ecr 2305866696], length 22
20:03:21.621120 IP localhost.46506 > localhost.hbci: Flags [.], ack 44, win 512, options [nop,nop,TS val 2305866698 ecr 2305866698], length 0

20:03:31.623825 IP localhost.hbci > localhost.46506: Flags [P.], seq 44:61, ack 160, win 512, options [nop,nop,TS val 2305876700 ecr 2305866698], length 17
20:03:31.623854 IP localhost.46506 > localhost.hbci: Flags [.], ack 61, win 512, options [nop,nop,TS val 2305876700 ecr 2305876700], length 0
20:03:31.623926 IP localhost.46506 > localhost.hbci: Flags [P.], seq 160:177, ack 61, win 512, options [nop,nop,TS val 2305876701 ecr 2305876700], length 17
20:03:31.623948 IP localhost.hbci > localhost.46506: Flags [.], ack 177, win 512, options [nop,nop,TS val 2305876701 ecr 2305876701], length 0

20:04:21.624549 IP localhost.hbci > localhost.46506: Flags [P.], seq 61:111, ack 177, win 512, options [nop,nop,TS val 2305926701 ecr 2305876701], length 50
20:04:21.624593 IP localhost.46506 > localhost.hbci: Flags [.], ack 111, win 512, options [nop,nop,TS val 2305926701 ecr 2305926701], length 0
20:04:21.625177 IP localhost.46506 > localhost.hbci: Flags [F.], seq 177, ack 111, win 512, options [nop,nop,TS val 2305926702 ecr 2305926701], length 0
20:04:21.625825 IP localhost.hbci > localhost.46506: Flags [F.], seq 111, ack 178, win 512, options [nop,nop,TS val 2305926702 ecr 2305926702], length 0
20:04:21.625858 IP localhost.46506 > localhost.hbci: Flags [.], ack 112, win 512, options [nop,nop,TS val 2305926702 ecr 2305926702], length 0

The first batch of packets belongs to the start of the request, the second batch is the first Ping+Pong, the third batch is the end of the request. As you can see the keep alive interval of 10seconds separates the first two batches. However no more pings occur between the second and last batch, as I would expect.

After taking a look at src/proto/h2/ping.rs and debugging a bit, it seems that once KeepAlive enters the state KeepAliveState::PingSent it does not go back to KeepAliveState::Scheduled, even after a Pong is received.

A possible fix would be to check if ping_sent_at has been cleared when the KeepAlive::scheduled() method is called:

--- a/src/proto/h2/ping.rs
+++ b/src/proto/h2/ping.rs
@@ -442,7 +442,16 @@ impl KeepAlive {
                 let interval = shared.last_read_at() + self.interval;
                 self.timer.reset(interval);
             }
-            KeepAliveState::Scheduled | KeepAliveState::PingSent => (),
+            KeepAliveState::PingSent => {
+                if shared.is_ping_sent() {
+                    return;
+                }
+
+                self.state = KeepAliveState::Scheduled;
+                let interval = shared.last_read_at() + self.interval;
+                self.timer.reset(interval);
+            }
+            KeepAliveState::Scheduled => (),
         }
     }

I'm not sure if this was the intended behavior of the original implementation.
I'll be happy to submit a PR if agreed.

Thanks in advance and keep up the good work!

@seanmonstar
Copy link
Member

Thanks for the nice write-up! It does seem like there's a bug in here. Would be good to get this fixed, probably writing a test for this might get trickier...

@seanmonstar seanmonstar added A-http2 Area: HTTP/2 specific. C-bug Category: bug. Something is wrong. This is bad! labels Oct 26, 2020
@pdcalado
Copy link
Contributor Author

pdcalado commented Oct 28, 2020

Hi @seanmonstar ,

Can you help me to see if I'm on the right track here?

To test the fix I suggested above, we could write a test like below.
Basically we expect the client to receive 3 pings in less than 5 seconds.
We reply with pong (ping + ack) to every ping.

Thanks in advance!

fn is_ping_frame(buf: &[u8]) -> bool {
    buf[3] == 6
}

fn assert_ping_frame(buf: &[u8], len: usize) {
    // Assert the StreamId is zero
    let mut ubuf = [0; 4];
    ubuf.copy_from_slice(&buf[5..9]);
    let unpacked = u32::from_be_bytes(ubuf);
    assert_eq!(unpacked & !(1 << 31), 0);

    // Assert ACK flag is unset (only set for PONG).
    let flags = buf[4];
    assert_eq!(flags & 0x1, 0);

    // Assert total frame size
    assert_eq!(len, 17);
}

async fn write_pong_frame(conn: &mut TkTcpStream) {
    conn.write_all(&[
        0, 0, 8,   // len
        6,   // kind
        0x1, // flag
        0, 0, 0, 0, // stream id
        0x3b, 0x7c, 0xdb, 0x7a, 0x0b, 0x87, 0x16, 0xb4, // payload
    ])
    .await
    .expect("client pong");
}

#[tokio::test]
async fn http2_keep_alive_count_server_pings() {
    let _ = pretty_env_logger::try_init();

    let mut listener = tcp_bind(&"127.0.0.1:0".parse().unwrap()).unwrap();
    let addr = listener.local_addr().unwrap();

    tokio::spawn(async move {
        let (socket, _) = listener.accept().await.expect("accept");

        Http::new()
            .http2_only(true)
            .http2_keep_alive_interval(Duration::from_secs(1))
            .http2_keep_alive_timeout(Duration::from_secs(1))
            .serve_connection(socket, unreachable_service())
            .await
            .expect("serve_connection");
    });

    // Spawn a "client" conn that only reads until EOF
    let mut conn = connect_async(addr).await;

    // write h2 magic preface and settings frame
    conn.write_all(b"PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n")
        .await
        .expect("client preface");
    conn.write_all(&[
        0, 0, 0, // len
        4, // kind
        0, // flag
        0, 0, 0, 0, // stream id
    ])
    .await
    .expect("client settings");

   let read_pings = async {
	// read until 3 pings are received
	let mut pings = 0;
	let mut buf = [0u8; 1024];
	while pings < 3 {
            let n = conn.read(&mut buf).await.expect("client.read");
            assert!(n != 0);

            if is_ping_frame(&buf) {
		assert_ping_frame(&buf, n);
		write_pong_frame(&mut conn).await;
		pings += 1;
            }
	}
    };

    // Expect all pings to occurs under 5 seconds
    tokio::time::timeout(Duration::from_secs(5), read_pings)
        .await
        .expect("timed out waiting for pings");
}

@seanmonstar
Copy link
Member

Yea that looks right to me, does it fail, and then with your suggested patch, succeed?

@pdcalado
Copy link
Contributor Author

Indeed, just fixed the use of tokio::time::timeout to make sure it fails on regression.
I'll submit a PR if agreed.
Thanks!

@seanmonstar
Copy link
Member

Yep, let's do it!

pdcalado added a commit to pdcalado/hyper that referenced this issue Oct 29, 2020
…ived

`KeepAliveState` did not transition from `PingSent` to `Scheduled` after
a pong was received. This prevented more than one ping to be sent by the
server. This fix checks if `ping_sent_at` has already been cleared by
`Ponger::poll` when `KeepAliveState::PingSent` state is active.

Fixes hyperium#2310
BenxiangGe pushed a commit to BenxiangGe/hyper that referenced this issue Jul 26, 2021
…ived

`KeepAliveState` did not transition from `PingSent` to `Scheduled` after
a pong was received. This prevented more than one ping to be sent by the
server. This fix checks if `ping_sent_at` has already been cleared by
`Ponger::poll` when `KeepAliveState::PingSent` state is active.

Fixes hyperium#2310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-http2 Area: HTTP/2 specific. C-bug Category: bug. Something is wrong. This is bad!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants