Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOF caused excessive polling #33

Closed
jerryz920 opened this issue Apr 19, 2019 · 11 comments
Closed

EOF caused excessive polling #33

jerryz920 opened this issue Apr 19, 2019 · 11 comments

Comments

@jerryz920
Copy link
Contributor

jerryz920 commented Apr 19, 2019

Hi buddy, thanks for fixing issue #32. But I need your help again...

Having my program run branch 0.9-fix with some traffic, and I encountered two additional problems:

  1. the flush problem, which only tries write_tls once. This one is just FYI, I think you already fixed it in the master branch. I can not run the master branch yet, since it gives me a clueless connection reset at some point. So I stick with 0.9-fix at the moment.
  2. I manually patched the 0.9-fix with change in bullet 1 (in flush it tries to write everything instead of only trying once). Then a timing related issue raises: for the bi-directional communication pattern I mentioned in Deadlock when both client and server have heavy write pressures #32 , if the client side has small data to transfer, and the server has large traffic to write through, then I can see massive recvfrom that returns 0 in syscall trace on the same socket. An example below (omitted non-important entries):

[pid 2088] sendto(10, "z\262H\323hY\264\17\301 PS3\237\216F\264K7\321\237<\33\345\244\231\25fS"F\231"..., 774, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
[pid 2088] <... sendto resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 2088] recvfrom(10, <unfinished ...>
[pid 2088] <... recvfrom resumed> "", 18437, 0, NULL, NULL) = 0
[pid 2088] sendto(10, "z\262H\323hY\264\17\301 PS3\237\216F\264K7\321\237<\33\345\244\231\25fS"F\231"..., 774, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
[pid 2088] <... sendto resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 2088] recvfrom(10, "", 18437, 0, NULL, NULL) = 0

Basically, it just repeats "sendto -> EAGAIN, recvfrom -> 0" within the tokio::io::copy context, inside AsyncWrite::poll_flush method. This eats a lot of CPU time at busy time.

My speculation about the problem is:

  1. the flush method favors write, so when write will block, it will return WouldBlock from complete_inner_io, even if it also calls read and gets an EOF.
  2. socket is still readable so the future is not put to sleep. Tokio directly polls it again. So local eof flag is reset, and write and read both polled again..
  3. When the problematic socket has heavy write, this problem will always be there.
  4. The communication will finish eventually, since tls writes still proceed.

One myth to resolve is the 2nd point. Tokio is using the epoll edge trigger mode, so readable events should be notified only once, before consuming the socket read buffer. However, I am not sure if this is the case for EOF. TcpStream does not show such repeat EOF trigger behavior though. So there might be some fancy stuff inside tokio that does the magic.

This is also a non-deterministic behavior. I attached a similar tar ball to reproduce with a multi-client program together with cloned tokio-rustls code. Can see it when run for a few times. Anyway, I guess a short term solution is to make EOF flag live across read/write calls, instead of using a local EOF flag. This way at least read is not polled more than once after shutdown. Still pondering if this can stop the future being polled again and again though...

Besides, the issue would raise the discussion about whether do write call in read again. My concern is that so far the implementation can make read hard to proceed when write pressure is heavy. But frankly speaking I don't have a clue about what is right or best for TLS communication so I am just discussing possibilities:

  • You mentioned that inside read calls we need to do write. Is that because of TLS session's internal buffer? I mean is it true that if you don't call Session.write_tls until EAGAIN, the write future may not be polled again? In this case, will futures::task::current().notify() be a viable solution to avoid doing write in read side?
  • Another way to think about is to rate limit the write. I saw rustls itself has a buffer limit option on session, is it a quick workaround to help write heavy programs? It not only improves read responsiveness, but also largely reduces memory footprints.

Anyway I am throwing a lot of content here, so hopefully it does not get you distracted. To completely address the issue I brought here I think the EOF thing needs to be fixed and write should be rate limited. I am also happy to get some PR up, you call it :)

Thanks for your time to read the long post!

tokio-issue-new.tar.gz

@quininer
Copy link
Owner

This is an interesting question, I welcome a PR!

I agree that we should have a non-local EOF, in fact it was there before 0.5, but in order to use rustls::Stream, it was removed.

I can not run the master branch yet, since it gives me a clueless connection reset at some point.

I am also interested in this, can you open an issue for it?

In this case, will futures::task::current().notify() be a viable solution to avoid doing write in read side?

I am not sure if this behavior is ideal, you can try it. :)

Another way to think about is to rate limit the write. I saw rustls itself has a buffer limit option on session, is it a quick workaround to help write heavy programs? It not only improves read responsiveness, but also largely reduces memory footprints.

This is reasonable, but it should be set by the user.

@jerryz920
Copy link
Contributor Author

jerryz920 commented Apr 19, 2019

This is an interesting question, I welcome a PR!

I agree that we should have a non-local EOF, in fact it was there before 0.5, but in order to use rustls::Stream, it was removed.

I checked the master branch, it seems the necessary state is already there so I plan to move there.

I can not run the master branch yet, since it gives me a clueless connection reset at some point.

I am also interested in this, can you open an issue for it?

Sure, I have figured out the problem. Described in #34 .

In this case, will futures::task::current().notify() be a viable solution to avoid doing write in read side?

I am not sure if this behavior is ideal, you can try it. :)

Another way to think about is to rate limit the write. I saw rustls itself has a buffer limit option on session, is it a quick workaround to help write heavy programs? It not only improves read responsiveness, but also largely reduces memory footprints.

This is reasonable, but it should be set by the user.

I will add this limit option first and see how it works.

jerryz920 pushed a commit to jerryz920/tokio-rustls that referenced this issue Apr 19, 2019
jerryz920 pushed a commit to jerryz920/tokio-rustls that referenced this issue Apr 19, 2019
@jerryz920
Copy link
Contributor Author

Limiting buffer size introduced a bunch of unexpected behavior (e.g. seeing a sudden increase of sendto call which are all EAGAIN). Digging into tokio to see why this happened.

@quininer
Copy link
Owner

Limit buffer size only needs to be written like this

connector.accept_with(io,  |session| session.set_buffer_limit(BUFFER_SIZE))

There is no need to add a new option to connector.

@jerryz920
Copy link
Contributor Author

Limit buffer size only needs to be written like this

connector.accept_with(io,  |session| session.set_buffer_limit(BUFFER_SIZE))

There is no need to add a new option to connector.

You are right.

BTW, what git development flow would this repository use? I typically do " git commit --fixup " to patch minor errors. After approval I will rebase to master and autosquash those fixup commits. Does that conflict with your work flavor?

@quininer
Copy link
Owner

No, I don't care about this. :)

@jerryz920
Copy link
Contributor Author

jerryz920 commented Apr 22, 2019

A few updates:

  1. I jumped into tokio and mio: HUP support is like spaghetti right now. RDHUP (half close) is not supported on every platform so for both epoll RDHUP and HUP event, mio returns HUP.
  2. tokio TcpStream (wrapper of PollEvented2) treats HUP as a writable condition, so when a socket write buffer is full, poll_write_ready returns Ready(Hup) instead of NotReady... And worse, PollEvented2 checks if the returned value is "Ready(_)", and notifies the underlying task to be immediately scheduled in run queue, again and again. That's how I can see massive "sendto->EAGAIN" calls.

This means what I previously suggested, i.e. "EOF flag", "Rate limit" will not solve the problem fundamentally. I am going to try if it will be helpful to call shutdown(Read) directly when EOF is read...

@quininer
Copy link
Owner

It sounds like mio supports half close is the best solution.

This is unlikely to be implemented in tokio-rustls, but we can expose an is_eof.

@jerryz920
Copy link
Contributor Author

yeah, I think this issue is actually reported before here in tokio: tokio-rs/tokio#449. Problem is that HUP is not never cleared once it is returned. So it is not a tokio-rustls specific problem. I need to sit down and think about how to handle this...

jerryz920 pushed a commit to jerryz920/tokio-rustls that referenced this issue Apr 26, 2019
jerryz920 pushed a commit to jerryz920/tokio-rustls that referenced this issue Apr 26, 2019
jerryz920 pushed a commit to jerryz920/tokio-rustls that referenced this issue May 8, 2019
quininer pushed a commit that referenced this issue May 9, 2019
@quininer
Copy link
Owner

Should this be fixed by tokio-rs/mio#939 ?

@jerryz920
Copy link
Contributor Author

Yes that's right. I am closing this ticket as well now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants