Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

Merged
merged 5 commits into from
Mar 22, 2024

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Mar 6, 2024

Closes #70098

The connection pool currently manages the list of available connections and the requests queue under a single lock.
As the number of cores and RPS rise, the speed at which the pool can manage connections becomes a bottleneck.

This PR brings the fast path (there are enough connections available to process all requests) down to a ConcurrentStack.Push + ConcurrentStack.TryPop.


Numbers for ConcurrentQueue

Numbers from #70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.json
crank compare .\1x256.json .\8x32.json
client 1x256 8x32
RPS 693,873 875,814 +26.22%
Patched RPS 873,571 876,394 +0.32%

This shows that before this PR, manually splitting load between multiple HttpClient instances can have a significant impact.
After the change, there's no more benefit to doing that as a single pool can efficiently handle the higher load.

YARP's http-http 100 byte scenario:

load yarp-base yarp-patched
Latency 50th (ms) 0.73 0.68 -6.97%
Latency 75th (ms) 0.82 0.74 -9.82%
Latency 90th (ms) 1.03 0.89 -13.39%
Latency 95th (ms) 1.41 1.18 -16.41%
Latency 99th (ms) 2.87 2.68 -6.63%
Mean latency (ms) 0.83 0.76 -8.74%
Requests/sec 306,699 335,921 +9.53%

In-memory loopback benchmark that stresses the connection pool contention: https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s) 1 2 3 4 5 6
main 2060 1900 1760 1670 1570 1500
patched 2150 2600 3400 3700 4100 4260

Breaking change consideration - This is no longer relevant after switching to ConcurrentStack

While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:

  • The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now a ConcurrentQueue.
  • Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
  • A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).

See #99364 (comment) for ConcurrentStack results (current PR).

@MihaZupan MihaZupan added this to the 9.0.0 milestone Mar 6, 2024
@MihaZupan MihaZupan requested a review from a team March 6, 2024 16:45
@MihaZupan MihaZupan self-assigned this Mar 6, 2024
@ghost
Copy link

ghost commented Mar 6, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #70098

The connection pool currently manages the list of available connections and the requests queue under a single lock.
As the number of cores and RPS rise, the speed at which the pool can manage connections becomes a bottleneck.

This PR brings the fast path (there are enough connections available to process all requests) down to a ConcurrentQueue.Enqueue + ConcurrentQueue.TryDequeue


Numbers from #70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.json
crank compare .\1x256.json .\8x32.json
client 1x256 8x32
RPS 693,873 875,814 +26.22%
Patched RPS 873,571 876,394 +0.32%

This shows that before this PR, manually splitting load between multiple HttpClient instances can have a significant impact.
After the change, there's no more benefit to doing that as a single pool can efficiently handle the higher load.

YARP's http-http 100 byte scenario:

load yarp-base yarp-patched
Latency 50th (ms) 0.73 0.68 -6.97%
Latency 75th (ms) 0.82 0.74 -9.82%
Latency 90th (ms) 1.03 0.89 -13.39%
Latency 95th (ms) 1.41 1.18 -16.41%
Latency 99th (ms) 2.87 2.68 -6.63%
Mean latency (ms) 0.83 0.76 -8.74%
Requests/sec 306,699 335,921 +9.53%

In-memory loopback benchmark that stresses the connection pool contention: https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s) 1 2 3 4 5 6
main 2060 1900 1760 1670 1570 1500
patched 2150 2600 3400 3700 4100 4260

Breaking change consideration
While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:

  • The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now a ConcurrentQueue.
  • Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
  • A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).
Author: MihaZupan
Assignees: MihaZupan
Labels:

area-System.Net.Http

Milestone: 9.0.0

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan
Copy link
Member Author

Using a stack here is close enough (in benchmarks the collection is going to be close to empty all the time, so contention between the stack and queue is similar). I'll switch the PR to use that to avoid the behavioral change.
We can revisit it in the future with more idle eviction heuristics to get the last few % with a queue if needed.

It does mean an extra 32 byte allocation for each enqueue op sadly (+1 for #31911).

load yarp-main yarp-stack yarp-queue
Latency 50th (ms) 0.73 0.69 -4.95% 0.68 -5.91%
Latency 75th (ms) 0.81 0.75 -7.49% 0.74 -8.97%
Latency 90th (ms) 0.99 0.88 -10.98% 0.85 -14.40%
Latency 95th (ms) 1.31 1.13 -13.99% 1.05 -20.00%
Latency 99th (ms) 2.83 2.60 -7.95% 2.45 -13.29%
Mean latency (ms) 0.82 0.76 -6.78% 0.75 -8.59%
Requests/sec 312,857 335,444 +7.22% 342,141 +9.36%
client client-main client-stack client-queue
Requests 80,028,791 107,128,778 +33.86% 107,868,124 +34.79%
Mean RPS 666,886 892,749 +33.87% 898,902 +34.79%
Method Toolchain Mean Error Ratio Allocated Alloc Ratio
SendAsync main 517.0 ns 4.27 ns 1.00 552 B 1.00
SendAsync stack 482.0 ns 2.87 ns 0.93 584 B 1.06
SendAsync queue 471.1 ns 1.37 ns 0.91 552 B 1.00

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MihaZupan
Copy link
Member Author

/azp run runtime-libraries stress-http

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpConnectionPool contention for HTTP/1.1
2 participants