-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Net 7 Kestrel windows service hangs after a period of time #82207
Comments
Just curious does it repro on 6.0? |
Not sure, we skipped over 6.0 and went straight to 7.0. We might be able to backtrack if that will help in diagnosing. |
There was another issue like this with similar symptoms (hanging after a day or something). Can you look at parallel stacks? |
Yeah, it was this one dotnet/aspnetcore#45215. I reviewed that one and I don't think a solution was found, just not enough info to diagnose. |
@sccrgoalie1 Can you run through all of the same steps? |
@davidfowl Absolutely, we ran through that issue extensively before we submitted our own. Was there something specific that would be helpful? I have the memory dump. All the requests are http not https. So far, it seems that the issue occurs after the API sits idle for a period of time. Our users work banker's hours, so it works all day long, but when they come back the next morning Kestrel is no longer responding to any requests until we restart the windows service it's running in. |
@sccrgoalie1 Is it possible for you to share the dump with us (we'd understand if you cannot)? |
Can you share more info about what exactly happens on the client and server when this happens? For instance, does the client get a response back? Do the server logs show anything interesting? Perhaps you may get some interesting output if you enable more detailed logging on the server. See here for more info on configuring logging: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/logging/?view=aspnetcore-7.0#configure-logging |
@adityamandaleeka Sure, here is the memory dump. |
The client does not get a response back, the requests just hang. Unfortunately, we haven't found anything in the logs or event viewer yet. We'll try increasing the detailed logging to see if we can get anything better. The windows service on the server is still running but just not responding. |
@sccrgoalie1 Thanks for sharing the dump. I looked through it and didn't spot any immediate issues that jumped out. Just curious, did taking the dump cause the process to "unhang" or was it still in the stuck/unresponsive state? |
As a side note (maybe not related to the problem you're hitting), I see a bunch of threads doing things related to performance counters, which are running code from aspnet_perf.dll. Do you know what's causing that? AFAICT aspnet_perf is an old ASP.NET framework component so I'm wondering what it's doing in the context of a ASP.NET Core app. |
No, taking the dump did not unstick it. At one of the sites having to restart daily we converted them to use Microsoft.Azure.Relay.AspNetCore and since then they have been running for days without any issues. I'm not sure at what level that package intertwines with Kestrel, but it seems to prevent the hang we are experiencing. |
The only thing I can think of that might be doing that is Application Insights but we are using the NetCore version (Microsoft.ApplicationInsights.AspNetCore) |
We have several .NET 7.0.2 applications running on Windows Server 2019. We run the apps as HashiCorp Nomad Jobs. We noticed that HTTP requests to an application start to fail if we open the Windows "Resource Monitor" application and select the dotnet process running our app from the list (toggle checkbox on). You can also select more than one apps, and all the selected apps start to fail. Apps that are not selected work without problems. The problem reproduces consistently. |
@heikkilamarko Thank you for reporting that. I was able to see the behavior you described even on a Windows 11 machine with an empty ASP.NET Core app. Nothing obvious popped out under the debugger but cc @noahfalk @davmason in case this rings any bells. Presumably checking the box in Resource Monitor kicks off perfmon or something under the hood right? |
Hmm, AFAICT the app is still running and not hung, it's just not getting the incoming requests. @BrennanConroy pointed out that the browser is also not timing out so perhaps something is accepting and intercepting the communication... |
It doesn't ring any bells for me. I am not sure exactly what resource monitor does under the hood |
Glad that you were able to reproduce the problem. A small clarification. Some of our services are Node.js applications. They don't have this problem. The problem occurs only for dotnet apps. |
@adityamandaleeka - No bells for me either I'm afraid. I suspect the right mental model for Resource Monitor and perfmon are that they are two independent apps that both make use of perf counter APIs. |
It seems possible that this is also what is causing the issues for our customers. We noticed they all are running a software called Acronis. It must be doing something similar to what Windows Resource monitor does to our ASP.NET Core app |
We believe we narrowed down the issue to this change #64834 This was verified by setting the environment variable @kouvel do you have any ideas how the Windows Threadpool change could cause this issue? Issue summary: |
We gave it a try. Setting |
@heikkilamarko Glad you were able to confirm that. It should be a reasonable workaround for now while we investigate and fix the issue. Thank you again for the simple repro steps! |
It looks like for some reason, when Resource Monitor is attached, the IO for listening for connections is being associated with the thread it is issued on, even though the handle is attached to an IOCP. AFAIK that's not supposed to happen when the handle is attached to an IOCP. The IO happens to be issued on a thread pool worker thread, and when the thread times out and exits, the IO is cancelled with |
Thanks for the quick investigation @kouvel! |
Can you open an issue on runtime for this and link it from here? |
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @mangod9 Issue DetailsIs there an existing issue for this?
Describe the bugWe just update our projects to .NET 7 from .NET 5 and are experiencing hangs where Kestrel quits responding. It works for a day or so and then the windows service needs to be restarted. This was not an issue on .NET5. I gathered a memory dump to see if I can figure out what is causing it to hang. Here are some screenshots. This was gathered using ProcDump. This API is installed at many unique locations and several have reported the same issue that they need to restart the service daily. For some locations, we use Azure relay. So far, the locations that go through the relay have not had any issues. Expected BehaviorThe API should respond while running Steps To ReproduceLet our project run for a day Exceptions (if any)No response .NET Version7.0.100 Anything else?No response
|
I've transferred the issue over to the runtime repo now |
Another thing to look into perhaps is why the IO completion with |
- When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Fixes dotnet#82207
…2245) * Check for pending IO in the portable thread pool's worker threads - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Fixes #82207
- Port of dotnet#82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for dotnet#82207
- Port of dotnet#82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for dotnet#82207
…ds (#82248) * [6.0] Check for pending IO in the portable thread pool's worker threads - Port of #82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for #82207 * Refactor Windows-specific code
…ds (#82246) * [7.0] Check for pending IO in the portable thread pool's worker threads - Port of #82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for #82207 * Refactor Windows-specific code
Is there an existing issue for this?
Describe the bug
We just update our projects to .NET 7 from .NET 5 and are experiencing hangs where Kestrel quits responding. It works for a day or so and then the windows service needs to be restarted. This was not an issue on .NET5. I gathered a memory dump to see if I can figure out what is causing it to hang. Here are some screenshots. This was gathered using ProcDump.
This API is installed at many unique locations and several have reported the same issue that they need to restart the service daily. For some locations, we use Azure relay. So far, the locations that go through the relay have not had any issues.
Expected Behavior
The API should respond while running
Steps To Reproduce
Let our project run for a day
Exceptions (if any)
No response
.NET Version
7.0.100
Anything else?
No response
The text was updated successfully, but these errors were encountered: