-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpConnectionPool.RequestQueue leaking items #66673
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionWhile investigating a large increase in memory usage in one of our services running on .NET 6.0.2, I investigated a memory dump and found an HttpConnectionPool containing a RequestQueue object with a very large number of items (>10k). This was problematic because those items were hanging on to HttpRequestMessage objects which in turn had large header payloads, and ultimately led to large memory consumption for this process. Interestingly, looking at telemetry, I see no evidence of ever having more than ~500 concurrent outstanding requests in total in the entire process, across all of our connection pools. So it seems odd, to say the least, that a single connection pool would be keeping track of this many queued requests. I haven't figured out why these requests are in the queue and/or why they aren't draining. We can share the memory dump internally if that would help.
Reproduction StepsUnclear Expected behaviorHttpConnectionPool shouldn't be holding on to queued requests that aren't in flight. Actual behaviorHttpConnectionPool seems hold on to requests that aren't in flight, leading to excessive memory usage. Regression?We did not notice this specifically with .NET Core 3.1, but it is possible it had been there too. Known WorkaroundsN/A Configuration
Other informationNo response
|
@CarnaViire could it be related to your recent investigations in the space? |
I don't think so. This issue is on Windows, and it seems to be for HTTP/2. (While mine were timeouts on Linux for HTTP/1.1) @davidni what is the typical behavior of your app (in terms of HTTP requests)? Do you only use HTTP/2, or HTTP/1.1 as well? What timeouts do you have set, if any? Are any exceptions happening, or all requests succeed, but the queue keeps growing? |
We use YARP to proxy requests to multiple (1k+) destinations.
In this case HTTP/2 only. While we do both, in this case, looking at the dump, this connection pool was only handling HTTP/2 requests, and the queue that grew was the
|
Triage: We need to understand the root cause. Fix at minimum in 7.0, most likely also service 6.0.x. |
Before 6.0, cancelling a request would result in it being removed from the request queue immediately. From looking at the dump, the requests sitting in the queue are canceled and the connection attempt never completes. Since there is no connection, nothing cleans up canceled queue entries, resulting in the leak. The main mitigation for this is to set the There is also a bug here in the connection pool that will make such leaks more common: when injecting a connection to the pool, we only check whether a request exists in the queue, but not whether it's already canceled. This means that we will issue a connection attempt even for already-canceled requests. As a result, you can end up in a situation like this (I hope I'm misreading something in the code):
Not only are we keeping the requests and everything associated with them rooted for a very long time, we are also wasting connection attempts on dead requests. |
Given that the default socket timeout on Windows is 21s, I'm afraid just setting On our side, we need to clean up the queue at least not only on successful, but on failed attempts too, and maybe even more aggressively. |
My suspicion is that for whatever reason the connection took longer than 21 seconds to fail since I saw the queue had never moved at all in the dump. As in the first connection attempt was still pending while 7k requests were added and canceled on its queue.
|
Mitigation might be to remove and throw away any cancelled requests in Thinking more generally, do we need to keep request references directly in the queue? Do we even need them there - what are they used for after getting picked from the queue? |
Opened #66944 to track aggressively opening more connections for HTTP/2.
Yes, this wouldn't entirely solve the problem of keeping the requests rooted, but it would be a solution to the issue of issuing connection attempts for pre-canceled requests.
They are currently used to determine whether to fail the first request in the queue on connection failures. |
The initial fixes that help with this issue have been merged (#66990, #66607). We could address this completely by removing queue entries as soon as the request is canceled (.NET 5 behavior). |
It has been mitigated, but the underlying issue still exists: we're not removing entries immediately when they're canceled. |
Triage: Mitigated 2 ways:
The remaining part could be done, to help reduce memory usage a bit, but it won't cause a memory leak anymore. |
Looks like #66990 is on track to ship in .NET 6.0.5, do we know when it would be released? Thanks all for your work on this and for providing a viable short-term mitigation! |
Description
While investigating a large increase in memory usage in one of our services running on .NET 6.0.2, I investigated a memory dump and found an HttpConnectionPool containing a RequestQueue object with a very large number of items (>10k). This was problematic because those items were hanging on to HttpRequestMessage objects which in turn had large header payloads, and ultimately led to large memory consumption for this process.
Interestingly, looking at telemetry, I see no evidence of ever having more than ~500 concurrent outstanding requests in total in the entire process, across all of our connection pools. So it seems odd, to say the least, that a single connection pool would be keeping track of this many queued requests.
I haven't figured out why these requests are in the queue and/or why they aren't draining. We can share the memory dump internally if that would help.
Reproduction Steps
Unclear
Expected behavior
HttpConnectionPool shouldn't be holding on to queued requests that aren't in flight.
Actual behavior
HttpConnectionPool seems hold on to requests that aren't in flight, leading to excessive memory usage.
Regression?
We did not notice this specifically with .NET Core 3.1, but it is possible it had been there too.
Known Workarounds
N/A
Configuration
Other information
No response
The text was updated successfully, but these errors were encountered: