Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add three-stage enqueuer/dispatcher scheme to SocketAsyncEngine, ThreadPoolWorkQueue and ThreadPoolTypedWorkItemQueue #100506

Merged
merged 21 commits into from
Jun 5, 2024

Conversation

eduardo-vp
Copy link
Member

@eduardo-vp eduardo-vp commented Apr 2, 2024

Fixes #93028.

In these scenarios, we currently handle two states: Scheduled and NotScheduled.

This change introduces the "Determining" state, which avoids requesting another resource (thread or TP work item) hastily when more items are enqueued while in this state. Instead, the parallelizer takes care of that. This change ensures that only one thread can be parallelizing at any time.

The transitions are as following:

  • From NotScheduled, the only transition is to Scheduled if new items are enqueued (a resource to process them is requested).
  • From Scheduled, the only transition is to Determining right before trying to dequeue an item.
  • From Determining, it'll transition to NotScheduled if it's determined that there are no more items to process. Otherwise, it'll transition to Scheduled if either new items were enqueued or it's determined that another resource is needed to guarantee all items are processed (in both cases a new resource is requested).

@kouvel
Copy link
Member

kouvel commented May 15, 2024

CC @dotnet/ncl

@kouvel kouvel added this to the 9.0.0 milestone May 15, 2024
@wfurt
Copy link
Member

wfurt commented May 15, 2024

what do we expect here? DO we see improvements in any of the existing micro benchmarks?
cc @tmds as well.

@kouvel
Copy link
Member

kouvel commented May 16, 2024

what do we expect here? DO we see improvements in any of the existing micro benchmarks? cc @tmds as well.

The relevant issue is #93028, I believe there were some CPU time improvements in a few scenarios. @eduardo-vp, could you please share some of the data? Also please add "Fixes #93028" to the description so that the issue is linked.

@eduardo-vp
Copy link
Member Author

In the customer-reported scenario exposed here #72153 there was a significant improvement in the CPU usage by implementing these changes. These numbers were collected using a local WSL2 instance.

Main branch Changed branch
Throughput (MB/s) 1028.5 1064.3
CPU usage reported by top (%) 260 175

Also some extra data was collected using the ASP.NET perf lab, the scenarios were Plaintext, JSON and Fortunes with different number of connections. Plaintext and JSON didn't show a relevant difference but also there was some less CPU usage for Fortunes as shown below.

f-256
f-128
f-64

Copy link
Member

@kouvel kouvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small suggestion, otherwise LGTM, thanks!

@eduardo-vp
Copy link
Member Author

CI errors are known

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the number of thread requests further in the thread pool and from async IO completion dispatchers
3 participants