-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blog: reducing tail latencies with auto yielding #422
Changes from 12 commits
b224416
c06a859
8ae63c5
d4e9232
a10af3c
6a2931f
fe6e915
3eb6853
e25321e
f1717c7
56e4561
8782f9d
9cffc8d
382e9e4
27a770d
944ceb6
f543416
92408bd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
+++ | ||
date = "2020-03-31" | ||
title = "Reducing tail latencies with automatic cooperative task yielding" | ||
description = "April 1, 2020" | ||
menu = "blog" | ||
weight = 982 | ||
+++ | ||
|
||
Tokio is a runtime for asynchronous Rust applications. It allows writing code | ||
using `async` & `await` syntax. For example: | ||
|
||
```rust | ||
let mut listener = TcpListener::bind(&addr).await?; | ||
|
||
loop { | ||
let (mut socket, _) = listener.accept().await?; | ||
|
||
tokio::spawn(async move { | ||
// handle socket | ||
}); | ||
} | ||
``` | ||
|
||
The Rust compiler transforms this code into a state machine. The Tokio runtime | ||
executes these state machines, multiplexing many tasks on a handful of threads. | ||
Tokio's scheduler requires that the generated task's state machine yields control | ||
back to the scheduler in order to multiplex tasks. Each `.await` call is an | ||
opportunity to yield back to the scheduler. In the above example, | ||
`listener.accept().await` will return a socket if one is pending. If there are | ||
no pending sockets, control is yielded back to the scheduler. | ||
|
||
This system works well in most cases. However, when a system comes under load, | ||
it is possible for an asynchronous resource to always be ready. For | ||
example, consider an echo server: | ||
|
||
```rust | ||
tokio::spawn(async move { | ||
let mut buf = [0; 1024]; | ||
|
||
loop { | ||
let n = socket.read(&mut buf).await?; | ||
|
||
if n == 0 { | ||
break; | ||
} | ||
|
||
// Write the data back | ||
socket.write(buf[..n]).await?; | ||
} | ||
}); | ||
``` | ||
|
||
|
||
If data is received faster than it can be processed, it is possible that more | ||
data will have already been received by the time the processing of a data chunk | ||
completes. In this case, `.await` will never yield control back to the scheduler, | ||
other tasks will not be scheduled, resulting in starvation and large latency | ||
variance. | ||
|
||
Currently, the answer to this problem is that the user of Tokio is responsible | ||
for adding yield points every so often. In practice, very few actually do this | ||
and end up being vulnerable to this sort of problem. | ||
carllerche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
A common solution to this problem is preemption. OS threads will interrupt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "With normal OS threads, the kernel will interrupt..." |
||
thread execution every so often in order to ensure fair scheduling of all | ||
threads. Runtimes that have full control over execution (Go, Erlang, etc.) | ||
will also use preemption to ensure fair scheduling of tasks. This is | ||
accomplished by injecting yield points — code which checks if the task has been | ||
executing for long enough and yields back to the scheduler if so — at compile-time. | ||
Unfortunately, Tokio is not able to use this technique as Rust's `async` generators | ||
do not provide any mechanism for executors (like Tokio) to inject such yield points. | ||
|
||
## Per-task operation budget | ||
|
||
Even though Tokio is not able to **preempt**, there is still an opportunity to | ||
nudge a task to yield back to the scheduler. As of [0.2.14], each Tokio task has | ||
an operation budget. This budget is reset when the scheduler switches to the | ||
task. Each Tokio resource (socket, timer, channel, ...) is aware of this | ||
budget. As long as the task has budget remaining, the resource operates as it did | ||
previously. Each asynchronous operation (actions that users must `.await` on) | ||
decrements the task's budget. Once the task is out of budget, all resources will | ||
perpetually return "not ready" until the task yields back to the scheduler. At that point, | ||
the budget is reset, and future `.await`s on Tokio resources will again function normally. | ||
|
||
Let's go back to the echo server example from above. When the task is scheduled, it | ||
is assigned a budget of 128 operations. When `socket.read(..)` and | ||
carllerche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`socket.write(..)` are called, the budget is decremented. If the budget is zero, | ||
the task yields back to the scheduler. If either `read` or `write` cannot | ||
proceed due to the underlying socket not being ready (no pending data or a full | ||
send buffer), then the task also yields back to the scheduler. | ||
|
||
The idea originated from a conversation I had with [Ryan Dahl][ry]. He is | ||
using Tokio as the underlying runtime for [Deno][deno]. When doing some HTTP | ||
experimentation with [Hyper] a while back, he was seeing some high tail | ||
latencies in some benchmarks. The problem was due to a loop not yielding back to | ||
the scheduler under load. Hyper ended up fixing the problem by hand in this one | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit/tioli: can we reference a Hyper PR/commit for the fix? |
||
case, but Ryan mentioned that, when he worked on [node.js][node], they handled | ||
the problem by adding **per resource** limits. So, if a TCP socket was always | ||
ready, it would force a yield every so often. I mentioned this conversation to | ||
[Jon Gjenset][jonhoo], and he came up with the idea of placing the limit on | ||
the task itself instead of on each resource. | ||
|
||
The end result is that Tokio should be able to provide more consistent runtime | ||
behavior under load. While the exact heuristics will most likely be tweaked over | ||
time, initial measurements show that, in some cases, tail latencies are reduced | ||
by almost 3x. | ||
|
||
[![benchmark](https://user-images.githubusercontent.com/176295/73222456-4a103300-4131-11ea-9131-4e437ecb9a04.png)](https://user-images.githubusercontent.com/176295/73222456-4a103300-4131-11ea-9131-4e437ecb9a04.png) | ||
|
||
"master" is before the automatic yielding and "preempt" is after. Click for a | ||
bigger version, see also the original [PR comment][pr] for more details. | ||
|
||
carllerche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## A note on blocking | ||
|
||
Although automatic cooperative task yielding improves performance in many cases, | ||
it cannot preempt tasks. Users of Tokio must still take care to avoid both CPU | ||
intensive work and blocking APIs. The [`spawn_blocking`][spawn_blocking] function | ||
can be used to "asyncify" these sorts of tasks by running them on a thread pool | ||
where blocking is allowed. | ||
|
||
Tokio does not, and will not attempt to detect blocking tasks and automatically | ||
compensate by adding threads to the scheduler. This question has come up a | ||
number of times in the past, so allow me to elaborate. | ||
|
||
For context, the idea is for the scheduler to include a monitoring thread. This | ||
thread would poll scheduler threads every so often and check that workers are | ||
making progress. If a worker is not making progress, it is assumed that the | ||
worker is executing a blocking task, and a new thread should be spawned to | ||
compensate. | ||
|
||
This idea is not new. The first occurence of this strategy that I am aware of is | ||
in the .NET thread pool, and was introduced more than ten years ago. | ||
Unfortunately, the strategy has a number of problems and because of this, it has | ||
not been featured in other thread pools / schedulers (Go, Java, Erlang, etc.). | ||
|
||
The first problem is that it is very hard to define "progress". A naive | ||
definition of progress is whether or not a task has been scheduled for over some | ||
unit of time. For example, if a worker has been stuck scheduling the same task | ||
for more than 100ms, then that worker is flagged as blocked and a new thread is | ||
spawned. In this definition, how does one detect scenarios where spawning a new | ||
thread **reduces** throughput? This can happen when the scheduler is generally | ||
under load and adding threads would make the situation much worse. To combat | ||
this, the .NET thread pool uses [hill climbing][hill]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like a few more words can be added about the hill climbing heuristic they use? I know what hill climbing is as I specialize in OR, but even that doesn't let me guess any further details on what they are measuring here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found https://mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/ which seems like a pretty good discussion of the specific hill-climbing approach used in CLR (at a glance). May be good as a second reference? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there reference(s) for how this was a problem in .NET? If so, it would be nice to link to that. |
||
|
||
The second problem is that any automatic detection strategy will be vulnerable | ||
to bursty or otherwise uneven workloads. This specific problem has been the bane | ||
of the .NET thread pool and is known as the "stuttering" problem. The hill | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there references we can link to showing the impacts of the stuttering problem in .NET? After a quick google search, I found http://joeduffyblog.com/2006/07/08/clr-thread-pool-injection-stuttering-problems/ from Joe Duffy's blog (although it's also from 2006) |
||
climbing strategy requires some period of time (hundreds of milliseconds) to | ||
adapt to load changes. This time period is needed, in part, to be able to | ||
determine that adding threads is improving the situation and not making it | ||
worse. | ||
|
||
The stuttering problem can be managed with the .NET thread pool, in part, | ||
because the pool is designed to schedule **coarse** tasks, i.e. tasks that | ||
execute in the order of hundreds of milliseconds to multiple seconds. However, | ||
in Rust, asynchronous task schedulers are designed to schedule tasks that should run in | ||
the order of microseconds to tens of milliseconds at most. In this case, any | ||
stutttering problem from a heuristic-based scheduler will result in far greater | ||
latency variations. | ||
|
||
The most common follow-up question I've received after this is "doesn't the Go | ||
scheduler automatically detect blocked tasks?". The short answer is: no. Doing | ||
so would result in the same stuttering problems as mentioned above. Also, Go has | ||
no need to have generalized blocked task detection because Go is able to | ||
preempt. What the Go scheduler **does** do is annotate potentially blocking | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there something we can link to for more information on how Go annotates potentially blocking calls? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, doesn't Go inject yield points as well? Good references here are golang/go#10958 and golang/go#24543. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mostly got this by reading the source... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what I can ref. |
||
system calls. This is roughly equivalent to the Tokio APIs | ||
[`spawn_blocking`][spawn_blocking] and [`block_in_place`][block_in_place]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The difference is that Go does this in the standard library, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tokio does as well... for example |
||
|
||
In short, as of now, the automatic cooperative task yielding strategy that has | ||
just been introduced is the best we have found for reducing tail latencies. | ||
hawkw marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<div style="text-align:right">—Carl Lerche</div> | ||
|
||
|
||
[0.2.14]: # | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remember to update this once it has been released. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for reminding me, i had already forgotten... I'll probably forget again anyway 😆 |
||
[ry]: https://github.com/ry | ||
[deno]: https://github.com/denoland/deno | ||
[Hyper]: github.com/hyperium/hyper/ | ||
[node]: https://nodejs.org | ||
[jonhoo]: https://github.com/jonhoo/ | ||
[pr]: https://github.com/tokio-rs/tokio/pull/2160#issuecomment-579004856 | ||
[spawn_blocking]: https://docs.rs/tokio/0.2/tokio/task/fn.spawn_blocking.html | ||
[block_in_place]: https://docs.rs/tokio/0.2/tokio/task/fn.block_in_place.html | ||
[hill]: https://en.wikipedia.org/wiki/Hill_climbing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link to
yield_now
, and maybe also rust-lang/futures-rs#2047 ?