Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

MihaZupan · 2024-03-06T16:45:53Z

The connection pool currently manages the list of available connections and the requests queue under a single lock.
As the number of cores and RPS rise, the speed at which the pool can manage connections becomes a bottleneck.

This PR brings the fast path (there are enough connections available to process all requests) down to a ConcurrentStack.Push + ConcurrentStack.TryPop.

Numbers for ConcurrentQueue

Numbers from #70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.json
crank compare .\1x256.json .\8x32.json

client	1x256	8x32
RPS	693,873	875,814	+26.22%
Patched RPS	873,571	876,394	+0.32%

This shows that before this PR, manually splitting load between multiple HttpClient instances can have a significant impact.
After the change, there's no more benefit to doing that as a single pool can efficiently handle the higher load.

YARP's http-http 100 byte scenario:

load	yarp-base	yarp-patched
Latency 50th (ms)	0.73	0.68	-6.97%
Latency 75th (ms)	0.82	0.74	-9.82%
Latency 90th (ms)	1.03	0.89	-13.39%
Latency 95th (ms)	1.41	1.18	-16.41%
Latency 99th (ms)	2.87	2.68	-6.63%
Mean latency (ms)	0.83	0.76	-8.74%
Requests/sec	306,699	335,921	+9.53%

In-memory loopback benchmark that stresses the connection pool contention: https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s)	1	2	3	4	5	6
main	2060	1900	1760	1670	1570	1500
patched	2150	2600	3400	3700	4100	4260

~~Breaking change consideration~~ - This is no longer relevant after switching to ConcurrentStack

~~While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:~~

~~The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now a ConcurrentQueue.~~
Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
~~A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).~~

See #99364 (comment) for ConcurrentStack results (current PR).

ghost · 2024-03-06T16:46:00Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #70098

The connection pool currently manages the list of available connections and the requests queue under a single lock.
As the number of cores and RPS rise, the speed at which the pool can manage connections becomes a bottleneck.

This PR brings the fast path (there are enough connections available to process all requests) down to a ConcurrentQueue.Enqueue + ConcurrentQueue.TryDequeue

Numbers from #70098 (comment):

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=256 --variable numberOfHttpClients=1 --server.framework net9.0 --client.framework net9.0 --json 1x256.json
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/httpclient.benchmarks.yml --scenario httpclient-kestrel-get --profile aspnet-citrine-lin --variable useHttpMessageInvoker=true --variable concurrencyPerHttpClient=32 --variable numberOfHttpClients=8 --server.framework net9.0 --client.framework net9.0 --json 8x32.json
crank compare .\1x256.json .\8x32.json

client	1x256	8x32
RPS	693,873	875,814	+26.22%
Patched RPS	873,571	876,394	+0.32%

This shows that before this PR, manually splitting load between multiple HttpClient instances can have a significant impact.
After the change, there's no more benefit to doing that as a single pool can efficiently handle the higher load.

YARP's http-http 100 byte scenario:

load	yarp-base	yarp-patched
Latency 50th (ms)	0.73	0.68	-6.97%
Latency 75th (ms)	0.82	0.74	-9.82%
Latency 90th (ms)	1.03	0.89	-13.39%
Latency 95th (ms)	1.41	1.18	-16.41%
Latency 99th (ms)	2.87	2.68	-6.63%
Mean latency (ms)	0.83	0.76	-8.74%
Requests/sec	306,699	335,921	+9.53%

In-memory loopback benchmark that stresses the connection pool contention: https://gist.github.com/MihaZupan/27f01d78c71da7b9024b321e743e3d88

Rough RPS numbers with 1-6 threads:

RPS (1000s)	1	2	3	4	5	6
main	2060	1900	1760	1670	1570	1500
patched	2150	2600	3400	3700	4100	4260

Breaking change consideration
While I was careful to keep the observable behavior of the pool as close as possible to what we have today, there is one important change I made intentionally:

The order in which we dequeue idle connections is changed from LIFO to FIFO (from a stack to a queue). This is because the backing store for available connections is now a ConcurrentQueue.
Where this distinction may be important is if a load drops for a longer period such that we no longer need as many connections. We would previously keep the overhead connections completely idle and eventually remove them via the idle timeout. With this change, we would keep cycling through all connections, potentially keeping more of them alive.
A slight benefit of that behavior may be that it makes it less likely to run into the idle close race condition (server closing an idle connection after we've started using it again).

Author:	MihaZupan
Assignees:	MihaZupan
Labels:	`area-System.Net.Http`
Milestone:	9.0.0

MihaZupan · 2024-03-06T16:49:59Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2024-03-06T16:50:08Z

Azure Pipelines successfully started running 1 pipeline(s).

MihaZupan · 2024-03-06T16:53:26Z

/azp run runtime-libraries stress-http

azure-pipelines · 2024-03-06T16:53:35Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs

MihaZupan · 2024-03-21T01:21:53Z

Using a stack here is close enough (in benchmarks the collection is going to be close to empty all the time, so contention between the stack and queue is similar). I'll switch the PR to use that to avoid the behavioral change.
We can revisit it in the future with more idle eviction heuristics to get the last few % with a queue if needed.

It does mean an extra 32 byte allocation for each enqueue op sadly (+1 for #31911).

load	yarp-main	yarp-stack		yarp-queue
Latency 50th (ms)	0.73	0.69	-4.95%	0.68	-5.91%
Latency 75th (ms)	0.81	0.75	-7.49%	0.74	-8.97%
Latency 90th (ms)	0.99	0.88	-10.98%	0.85	-14.40%
Latency 95th (ms)	1.31	1.13	-13.99%	1.05	-20.00%
Latency 99th (ms)	2.83	2.60	-7.95%	2.45	-13.29%
Mean latency (ms)	0.82	0.76	-6.78%	0.75	-8.59%
Requests/sec	312,857	335,444	+7.22%	342,141	+9.36%

client	client-main	client-stack		client-queue
Requests	80,028,791	107,128,778	+33.86%	107,868,124	+34.79%
Mean RPS	666,886	892,749	+33.87%	898,902	+34.79%

Method	Toolchain	Mean	Error	Ratio	Allocated	Alloc Ratio
SendAsync	main	517.0 ns	4.27 ns	1.00	552 B	1.00
SendAsync	stack	482.0 ns	2.87 ns	0.93	584 B	1.06
SendAsync	queue	471.1 ns	1.37 ns	0.91	552 B	1.00

MihaZupan · 2024-03-21T14:14:05Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2024-03-21T14:14:14Z

Azure Pipelines successfully started running 1 pipeline(s).

MihaZupan · 2024-03-21T14:14:15Z

/azp run runtime-libraries stress-http

azure-pipelines · 2024-03-21T14:14:24Z

Azure Pipelines successfully started running 1 pipeline(s).

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs

MihaZupan added the area-System.Net.Http label Mar 6, 2024

MihaZupan added this to the 9.0.0 milestone Mar 6, 2024

MihaZupan requested a review from a team March 6, 2024 16:45

MihaZupan self-assigned this Mar 6, 2024

stephentoub reviewed Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Show resolved Hide resolved

stephentoub reviewed Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Outdated Show resolved Hide resolved

stephentoub reviewed Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Show resolved Hide resolved

stephentoub reviewed Mar 6, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Outdated Show resolved Hide resolved

This was referenced Mar 6, 2024

System.Text.Json failing some large file tests #59678

Closed

[wasm] System.Text.Json.Tests running out of memory #98578

Closed

MihaZupan mentioned this pull request Mar 7, 2024

Bookkeeping bug in HttpConnectionPool when handling an HTTP/2 => HTTP/1.1 downgrade #99401

Closed

doddgu mentioned this pull request Mar 11, 2024

YARP has a higher cpu usage than Nginx microsoft/reverse-proxy#2427

Open

MihaZupan added 2 commits March 20, 2024 17:21

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool

71ba0b6

More comments

4153193

Switch to ConcurrentStack

5db7ecd

MihaZupan force-pushed the http-h1-contention branch from 9c27a09 to 5db7ecd Compare March 21, 2024 14:09

Merge branch 'main' into http-h1-contention

96bc4f5

stephentoub reviewed Mar 22, 2024

View reviewed changes

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Show resolved Hide resolved

stephentoub approved these changes Mar 22, 2024

View reviewed changes

Merge branch 'main' into http-h1-contention

713fc12

This was referenced Mar 22, 2024

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

[browser][MT] TIMED_OUT but 0 was expected #99888

Closed

MihaZupan merged commit 78b5f40 into dotnet:main Mar 22, 2024
87 checks passed

MihaZupan mentioned this pull request Apr 3, 2024

Debug assertion failure in System.Net.Http.HttpConnection.SendAsync #100616

Closed

github-actions bot locked and limited conversation to collaborators Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

MihaZupan commented Mar 6, 2024 •

edited

Loading

ghost commented Mar 6, 2024

MihaZupan commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

MihaZupan commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

MihaZupan commented Mar 21, 2024

MihaZupan commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

MihaZupan commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

Improve the throughput of SocketsHttpHandler's HTTP/1.1 connection pool #99364

Conversation

MihaZupan commented Mar 6, 2024 • edited Loading

ghost commented Mar 6, 2024

MihaZupan commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

MihaZupan commented Mar 6, 2024

azure-pipelines bot commented Mar 6, 2024

MihaZupan commented Mar 21, 2024

MihaZupan commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

MihaZupan commented Mar 21, 2024

azure-pipelines bot commented Mar 21, 2024

MihaZupan commented Mar 6, 2024 •

edited

Loading