Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http2Stream write() never fails after network disconnect (Linux) #48539

Closed
surlyhacker opened this issue Jun 24, 2023 · 4 comments
Closed

Http2Stream write() never fails after network disconnect (Linux) #48539

surlyhacker opened this issue Jun 24, 2023 · 4 comments
Labels
http2 Issues or PRs related to the http2 subsystem.

Comments

@surlyhacker
Copy link

surlyhacker commented Jun 24, 2023

Version

v20.3.1

Platform

Linux redacted 5.15.0-75-generic #82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

node:http2

What steps will reproduce the bug?

I have an HTTP2 client that opens a streaming request (sets endStream: false) and the continues to write() data to the request periodically. Once this stream is connected, if I abruptly disconnect the network on the client, the call to write() never returns false or generates an error argument to the write() callback. As a result, nothing ever indicates to the client that the TCP socket is dead, even after 5 minutes of failed writes once per second.

A workaround to this is to use HTTP2 ping, which does of course fail in this scenario.

Some excerpts of code, which is just a slight tweak to the HTTP2 example here.

const client = http2.connect('https://localhost:8443', {
  ca: fs.readFileSync('localhost.pem'),
  rejectUnauthorized:false
});

...
req.on('response', (headers, flags) => {
  

  setInterval(() =>{
    try{
      var writeres = req.write("hi from client "+ new Date(),undefined,(thing)=>{
        if(thing){
          console.log(new Date() + " client write callback error: "+thing);
        }
      });
      if (!writeres ){
        console.log(new Date() + " client write returned false");
      }
    } 
    catch (e){
      console.log(new Date() + "write error: "+e);
    }
      
      },1000);
     
});

How often does it reproduce? Is there a required condition?

Reproduces consistently on Linux.

But on Windows, the code operates as expected and outputs:

Sat Jun 24 2023 12:15:43 GMT-0400 (Eastern Daylight Time) client write returned false
Sat Jun 24 2023 12:15:43 GMT-0400 (Eastern Daylight Time) client write callback error: Err
or [ERR_STREAM_DESTROYED]: Cannot call write after a stream was destroyed
node:internal/http2/core:1375
      throw new ERR_HTTP2_INVALID_SESSION();
      ^

What is the expected behavior? Why is that the expected behavior?

Expected behavior as demonstrated on Windows above is as documented for stream.Writable.write():

... If an error occurs, the callback will be called with the error as its first argument. ...
While a stream is not draining, calls to write() will buffer chunk, and return false.

However, I guess I feel the documentation is a bit self-contradictory since it also says:

The return value is true if the internal buffer is less than the highWaterMark configured when the stream was created after admitting chunk. If false is returned, further attempts to write data to the stream should stop until the 'drain' event is emitted.

The default highWaterMark seems to be 16kb, so I am pretty sure this exceeds it after 5 minutes of writing ~72 bytes every second, but it's a bit unclear whether write() should return false when the stream is "not draining" vs highWaterMark has been reached. Those are 2 different things, right? Can't the stream fail to drain long before the highWaterMark is reached?

Regardless, the difference in behavior between Windows and Linux is also demonstrative.

What do you see instead?

No error callbacks or thrown exceptions or other indications of an issue (unless using HTTP2 pings as well)

Additional information

No response

@bnoordhuis
Copy link
Member

That's expected behavior, at least on Linux. TCP is designed to be resilient to transient network outages. Windows is a little more aggressive in timing out unroutable connections, Linux less so. Not a bug, in other words.

@surlyhacker
Copy link
Author

surlyhacker commented Jun 24, 2023

Thanks. I had that thought in the back of my mind but didn't realize how long the TCP ACK timeout seems to be set to (when you include TCP retries/retransmissions) still in 2023. If I run the test for longer, it does in fact generate an error after about ~16 minutes. Seems really unreasonably long nowadays and I wish there was a way to set this per socket (that Node could use) but I believe in Linux it's only at the OS level where it can be tweaked system-wide.

More info here: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html

Edit: Actually there is a socket option called TCP_USER_TIMEOUT that seems to be a viable fix for this but I don't know if it can be set using Node APIs. There's a package here but it's pretty old: https://www.npmjs.com/package/net-keepalive?activeTab=readme. It does seem to work though, and I can use its API to set TCP_USER_TIMEOUT and then receive the appropriate errors from write() in less than 16 minutes if desired.

@surlyhacker
Copy link
Author

For some history: #7217

@bnoordhuis
Copy link
Member

I'm aware of TCP_USER_TIMEOUT but since it's Linux-only Node.js doesn't (and won't) support it.

(The variance of TCP timeouts is one reason why ping frames exist, by the way.)

I'll go ahead and close this as "by design."

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023
@bnoordhuis bnoordhuis added the http2 Issues or PRs related to the http2 subsystem. label Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem.
Projects
None yet
Development

No branches or pull requests

2 participants