-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WCF Client Hang in Linux Containers #5600
Comments
Explicitly open your channel before your first call and that should fix this issue. Also you should be using the WCF client asynchronously if possible. You don't need the service and client side to match in their async/sync operation definition, they just need to be equivalent. This means you don't need to modify your service side code to do this. E.g. |
@mconnew I don't understand how opening a channel explicitly solve the issue. As my knowledge the blocking is happening on the open channel code. As per the below code reference Open method is trying to opening a channel asynchronously and infinitely waiting for the open channel task result. So opening channel explicitly or implicitly doesn't make any difference. https://source.dot.net/#System.ServiceModel.Primitives/System/ServiceModel/Channels/CommunicationObject.cs,f4a3b3f9f38225b6,references And the second part is why we need to convert the client as asynchronous. It stuck in the channel open not reached method execution level. |
Did you try explicitly opening the channel like I said? You need to cast to IChannel first, then open it. If you wanted an explanation of the reason why, it's several paragraphs of explanation and would have meant my response would have taken longer as I would need to set aside the time to write it all out. I'll simply say that your understanding of what's going on is incorrect. |
Ok let me try it. Is the below sample correct? NetTcpBinding netTcpBinding = new NetTcpBinding(SecurityMode.None); |
@mconnew I tried to open channel explicitly, issue still there. could you please more details what is root cause it is getting blocked. |
What's the call stack now? It won't be the same |
This is the call stack 00007F4970FF7470 00007fc9597e1e96 [HelperMethodFrame_1OBJ: 00007f4970ff7470] System.Threading.Monitor.ObjWait(Int32, System.Object) |
@mconnew This is the call stack 00007F4970FF7470 00007fc9597e1e96 [HelperMethodFrame_1OBJ: 00007f4970ff7470] System.Threading.Monitor.ObjWait(Int32, System.Object) |
@mconnew if possible could you please share the root cause in a short form basically to understand what is going on. If you share that it will be more helpful. |
When you don't explicitly open a channel, the first call needs to open it. This implicit open puts the channel in a special mode which has performance consequences. It places the actual request on a queue and then does the open. Once the channel is open, it pulls a request off the head of the queue, sends the request, waits for the reply which gets returned to the caller, then gets the next request on the queue. Only once the queue is empty does it switch out of this mode. With the right usage pattern you can get stuck in this mode. Basically it causes requests to be sent to the server serially. "Normal" behavior is only one message can be sent at a time, and only one can be received at a time, but the reply from an outstanding request doesn't need to be received before the next request can be sent, and responses can arrive out of order. If your usage is such that you are reusing the channel from multiple threads, and your rate of requests per second * average call completion time is greater than 1, you can be getting further and further behind and get to the point where requests aren't completing. From your original call stack it showed that you were implicitly opening and the call was waiting for its request to be picked up from the queue and the reply received. I've never seen an open just hang like that, it usually just fails. So now we need to work out why the open is failing. We have end to end scenario tests which run with every change that runs on multiple Linux distros so it's not a basic functional bug where it just doesn't work on Linux. There are 2 separate diagnostic steps to take. The easiest is to set the |
@mconnew we are using SecurityMode.None, we don't have any security enabled. I believe the default OpenTimeout is 1 min. |
Are you using Buffered or Streamed transfer mode? There's been a bug identified with streamed transfer mode with no security, but that opens successfully, the issue is it times out immediately so likely not the same issue, but it's possible it's related. |
@mconnew we are using Buffered mode. The container image is mcr.microsoft.com/dotnet/aspnet:8.0.6. I can try with security enabled but it takes time. But we cannot change our services to security enabled. |
Trying with security is a diagnostic step, not a suggested solution. |
Describe the bug
I have an application which is trying to push data to a WCF service using NetTCP protocol.
The application is running under K8s linux container
We are using mcr.microsoft.com/dotnet/aspnet:8.0.6 base image to run our application
The application is keep pushing data to the WCF service, the application is running fine for couple of days.
After that it get hang on while try to send data and none of the WCF calls working all the calls getting blocked even though it is a different WCF service
By using K8s pod console i have ran a test program where the above application is running. That test application can access the endpoint and get the result. Which means there is no network issue from Pod.
Expected behavior
If it is network issue or any other issue it should get error after the open timeout. but it keeps waiting infinitely.
Additional context
Callstack from dotnet dump
Child SP IP Call Site
00007EA044AB56A0 00007f0fc35d4e96 [HelperMethodFrame_1OBJ: 00007ea044ab56a0] System.Threading.Monitor.ObjWait(Int32, System.Object)
00007EA044AB57D0 00007F0F4BA955CF System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
00007EA044AB5880 00007F0F4C0BF817 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
00007EA044AB58E0 00007F0F4C0BF5A7 System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
00007EA044AB5930 00007F0F4C0BF4A8 System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task, System.Threading.Tasks.ConfigureAwaitOptions)
00007EA044AB5950 00007F0F4C7296C2 System.ServiceModel.Channels.ServiceChannel+CallOpenOnce.System.ServiceModel.Channels.ServiceChannel.ICallOnce.Call(System.ServiceModel.Channels.ServiceChannel, System.TimeSpan)
00007EA044AB5960 00007F0F4C11C6CD System.ServiceModel.Channels.ServiceChannel+CallOnceManager.CallOnce(System.TimeSpan, CallOnceManager)
00007EA044AB59C0 00007F0F4C725899 System.ServiceModel.Channels.ServiceChannel.Call(System.String, Boolean, System.ServiceModel.Dispatcher.ProxyOperationRuntime, System.Object[], System.Object[], System.TimeSpan)
00007EA044AB5B60 00007F0F4C723BA1 System.ServiceModel.Channels.ServiceChannelProxy.Invoke(System.Reflection.MethodInfo, System.Object[])
The text was updated successfully, but these errors were encountered: