Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: reducing tail latencies with auto yielding #422

Merged
merged 18 commits into from
Apr 1, 2020
Merged

Conversation

carllerche
Copy link
Member

@carllerche carllerche commented Apr 1, 2020

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have a few nitpicks on spelling and such:

content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
Comment on lines 141 to 142
under load and adding threads would make the situation much worse. To combat
this, the .NET thread pool uses [hill climbing][hill].
Copy link
Contributor

@Darksonn Darksonn Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a few more words can be added about the hill climbing heuristic they use? I know what hill climbing is as I specialize in OR, but even that doesn't let me guess any further details on what they are measuring here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found https://mattwarren.org/2017/04/13/The-CLR-Thread-Pool-Thread-Injection-Algorithm/ which seems like a pretty good discussion of the specific hill-climbing approach used in CLR (at a glance). May be good as a second reference?

Comment on lines 156 to 158
the order of micro seconds to tens of milliseconds at most. In this case, any
stutttering problem from a heuristic based scheduler will result in far greater
latency variations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the order of micro seconds to tens of milliseconds at most. In this case, any
stutttering problem from a heuristic based scheduler will result in far greater
latency variations.
the order of microseconds to tens of milliseconds at most. In this case, any
stuttering problem from a heuristic based scheduler will result in far greater
latency variations.

<div style="text-align:right">&mdash;Carl Lerche</div>


[0.2.14]: #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to update this once it has been released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me, i had already forgotten... I'll probably forget again anyway 😆

content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
variance.

Currently, the answer to this problem is that the user of Tokio is responsible
for adding yield points every so often. In practice, very few actually do this
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we link to yield_now, and maybe also rust-lang/futures-rs#2047 ?

for adding yield points every so often. In practice, very few actually do this
and end up being vulnerable to this sort of problem.

A common solution to this problem is preemption. OS threads will interrupt
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With normal OS threads, the kernel will interrupt..."

task. Each Tokio resource (socket, timer, channel, ...) is now aware of this
budget. As long as the task as budget remaining, the resource operates as it did
previously. Each asynchronous operation (actions that users must `.await` on)
decrement the task's budget. Once the task is out of budget, all resources will
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really true though. It's only true if they await a tokio resource (the sentence says "Each asynchronous operation"). And I guess we also don't want to get into the details of how it's really every poll call, not every .await.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you update it... I guess I can say "all tokio resources".

@jonhoo
Copy link
Sponsor Contributor

jonhoo commented Apr 1, 2020

This looks good overall! I think it'd be good to include a paragraph on "next steps", which would include:

  • We'd like for third-party resources to be able to participate.
  • We'd like to provide "sub-budgets" for sub-executors/manual poll impls.
  • It'd be cool if there was a way to extend this mechanism so that all futures could take advantage of it, even with custom executors.

Could even mention that the docs have already been written for the first two, and that they are just not exposed out of caution in case experience will make us want to change them.

@LucioFranco
Copy link
Member

I took a read overall reads well, I agree with some of jon's points but overall +1 from my end!

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I gave it a copyediting pass and left suggestions on some minor grammar nits and typos.

Also, since this post has a lot of discussion of prior art & comparisons with other approaches, it would be nice if there were more references for statements about other schedulers. If it's not a lot of effort, I would love to see more links.

Otherwise, looking good!

Comment on lines 26 to 27
Tokio's scheduler requires that the generated task state machine yields control
back to the scheduler in order to multiplex tasks. Each `.await` call is an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: I might rephrase this like

Suggested change
Tokio's scheduler requires that the generated task state machine yields control
back to the scheduler in order to multiplex tasks. Each `.await` call is an
In order to multiplex tasks, Tokio's scheduler requires that the generated task
state machine yields control back to the scheduler. Each `.await` call is an

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also consider reframing this as a requirement of Rust's futures model, rather than of Tokio's scheduler in particular?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note overlap with this.

content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
Comment on lines 165 to 166
system calls. This is roughly equivalent to the Tokio APIs
[`spawn_blocking`][spawn_blocking] and [`block_in_place`][block_in_place].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that Go does this in the standard library, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tokio does as well... for example tokio::fs. The difference being that Tokio provides access to these fns as it doesn't preempt.

scheduler automatically detect blocked tasks?". The short answer is: no. Doing
so would result in the same stuttering problems as mentioned above. Also, Go has
no need to have generalized blocked task detection because Go is able to
preempt. What the Go scheduler **does** do is annotate potentially blocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something we can link to for more information on how Go annotates potentially blocking calls?

Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, doesn't Go inject yield points as well? Good references here are golang/go#10958 and golang/go#24543.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly got this by reading the source...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what I can ref.

content/blog/2020-04-preemption.md Show resolved Hide resolved
carllerche and others added 14 commits April 1, 2020 12:18
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
Co-Authored-By: Alice Ryhl <alice@ryhl.io>
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I had some last notes that may be useful.

content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
content/blog/2020-04-preemption.md Outdated Show resolved Hide resolved
carllerche and others added 2 commits April 1, 2020 14:14
Co-Authored-By: Eliza Weisman <eliza@buoyant.io>
@carllerche carllerche merged commit fff01c5 into master Apr 1, 2020
resources will again function normally.

Let's go back to the echo server example from above. When the task is scheduled, it
is assigned a budget of 128 operations pr "tick". The number 128 was picked
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per, not pr

@carllerche carllerche deleted the preemption-blog branch July 21, 2020 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants