Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional timeouts in HttpConnectionPool.ConnectAsync #63706

Closed
fstugren opened this issue Jan 12, 2022 · 20 comments
Closed

Occasional timeouts in HttpConnectionPool.ConnectAsync #63706

fstugren opened this issue Jan 12, 2022 · 20 comments
Assignees
Milestone

Comments

@fstugren
Copy link

fstugren commented Jan 12, 2022

Description

We run a pretty busy service, let's call it the "gateway", which accepts inbound https calls, and based on some rules, issues outbound https requests to other services (let's call those "internal" services). The gateway is implemented in Asp.Net 5 and runs on Linux, deployed on Azure Kubernetes (AKS). The internal services run on various technologies, most but not all on .Net/Linux/AKS.

The gateway uses HttpClient for outbound requests. The HttpClientHandler's lifetime is managed with the IHttpClientFactory pattern. There is one shared client handler for all outbound requests to all internal services. The handler uses a client certificate for authentication. This client handler is recycled by the factory every 10 minutes.

The puzzling behavior is that occasionally we see failures to connect when the service makes outbound calls, which end in a timeout. I call them puzzling because:

  • Failures don't seem related to traffic volume, so we can't correlate them to too many concurrent requests.
  • Failures are rare - a few dozen to a few hundreds of those every day on each node running the gateway service
  • Failures usually don't come in clusters - it's one failure, then another one 10 minutes later, etc.
  • Failures don't seem to be caused by issues in the "gateway" code - otherwise I'd expect that the internal service that gets most of the outbound calls would see most of the connection failures as well. Instead, some services with a large amount of traffic see the fewest connection errors.

Upon failure, either a TaskCanceledException is thrown or, most often, an OperationCanceledException.
This is the most common call stack:

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStream.<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize)
   at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Boolean async, Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)

The errors can be recognized by this sequence of methods in the call stack which seem to indicate they happen during the connection phase of an outbound request:

   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(...)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(...)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(...)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(...)

The gateway also receives other timeouts from the internal services, such as requests taking too long to process, but those cause a different exception call stack, and we can correlate them using correlation Ids in cross-service logs. Those actual request timeouts always have "SendAsync(...)" in the call stack instead of "ConnectAsync". The requests that fail with connection errors leave no trace on the internal service; no data ever makes it there.

I looked at the code of various internal services to see if there's an issue there, maybe some custom connection handler deadlock, but it's mostly just standard Kestrel stuff. I also took process dumps on the one of the internal services to see if any threads are hanging in the connection stage, found nothing so far.

How do we investigate this next?
What can be the root cause?

We could add retries, but this is problematic for several reasons:

  • There's no good way to make a distinction in code between a request timeout (server takes too long) and a connection timeout
  • Most connection timeouts happen after the request timeout has expired, which by default is 100 seconds. I could not find a way to set up a separate timeout for establishing a connection
  • We still don't know the root cause.

While this is not a huge issue, it's a constant behavior and it causes SLA issues for customers, so I'd very much like a solution.

Where do I go from here?

Reproduction Steps

No specific repro steps - issue reproduces occasionally.

Expected behavior

No connection timeouts

Actual behavior

Occasional connection timeouts

Regression?

Does not seem to be a regression. The service was recently converted to .Net 5 from .Net Core 3.1 but the errors were seen before the conversion.

Known Workarounds

No response

Configuration

  • Asp.Net 5
  • Os: Linux/Ubuntu-based
  • Platform: AKS/Containers

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels Jan 12, 2022
@ghost
Copy link

ghost commented Jan 12, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We run a pretty busy service, let's call it the "gateway", which accepts inbound https calls, and based on some rules, issues outbound https requests to other services (let's call those "internal" services). The gateway is implemented in Asp.Net 5 and runs on Linux, deployed on Azure Kubernetes (AKS). The internal services run on various technologies, most but not all on .Net/Linux/AKS.

The gateway uses HttpClient for outbound requests. The HttpClientHandler's lifetime is managed with the IHttpClientFactory pattern. There is one shared client handler for all outbound requests to all internal services. The handler uses a client certificate for authentication. This client handler is recycled by the factory every 10 minutes.

The puzzling behavior is that occasionally we see failures to connect which end in a timeout. I call them puzzling because:

  • Failures don't seem related to traffic volume, so we can't ascribe them to too much traffic.
  • Failures are rare - a few dozen to a few hundreds of those every day on each node running the gateway service
  • Failures usually don't come in clusters - it's one failure, then another one 10 minutes later, etc.
  • Failures don't seem to be caused by issues in the "gateway" code - otherwise I'd expect that the internal service that gets most of the outbound calls would see most of the connection failures as well. Instead, some services with a large amount of traffic see the fewest connection errors.

Upon failure, either a TaskCanceledException is thrown or, most often, an OperationCanceledException
This is the most common call stack:

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStream.<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize)
   at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Boolean async, Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)

The error can be recognized by the fact that it seems to happen during the connection phase

   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(...)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(...)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(...)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(...)

The gateway also receives other timeouts from the internal services, such as requests taking too long to process, but those cause a distinctively different call stack in the caller, and we can correlate them using correlation Ids in logs. Those actual request timeouts will always have "SendAsync" in the call stack instead of "ConnectAsync". The requests that fail with connection errors leave no trace on the internal service; no data ever makes it there.

I looked at the code of various internal services to see if there's an issue there, maybe some custom connection handler deadlock, but it's mostly just standard Kestrel stuff. I also took process dumps on the one of the internal services to see if any threads are hanging in the connection stage, found nothing so far.

How do we investigate this next?
What can be the root cause?

We could add retries, but this is problematic for several reasons:

  • There's no good way to make a distinction in code between a request timeout (server takes too long) and a connection timeout
  • Most connection timeouts happen after the request timeout has expired, which by default is 100 seconds. I could not find a way to set up a separate timeout for establishing a connection
  • We still don't know the root cause.

While this is not a huge issue, it's a constant behavior and it causes SLA issues for customers, so I'd very much like a solution.

Where do I go from here?

Reproduction Steps

No specific repro steps - issue reproduces occasionally.

Expected behavior

No connection timeouts

Actual behavior

Occasional connection timeouts

Regression?

Does not seem to be a regression. The service was recently converted to .Net 5 from .Net Core 3.1 but the errors were seen before the conversion.

Known Workarounds

No response

Configuration

  • Asp.Net 5
  • Os: Linux/Ubuntu-based
  • Platform: AKS/Containers

Other information

No response

Author: fstugren
Assignees: -
Labels:

area-System.Net.Http, untriaged

Milestone: -

@wfurt
Copy link
Member

wfurt commented Jan 13, 2022

TLS handshake is the most CPU intensive part. You can also check state of thread pool to see if things are moving.
You can also probably use ConnectCallback to log IP info and try to get packet capture to see what is going on wire for the failing request. I know that many be lot of data but it could provide some answers.
Since this is hosted in Azure, look for port exhaustion if you have many outbound request. I'm not sure if same applies to AKS but we seen issue like this in the past.

@karelz
Copy link
Member

karelz commented Jan 13, 2022

Triage: Packet capture will tell if it is external, or caused by the (perhaps overloaded) system.
Without that, not much we can do.

Things that might help:
Is it always to the same destination? (a particular server)
Is it perhaps related to IPv6? (we have seen problems with IPv6, though I would expect them to be more consistent)

@ghost
Copy link

ghost commented Jan 13, 2022

This issue has been marked needs more info since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

@fstugren
Copy link
Author

fstugren commented Jan 13, 2022

To clarify my original post - I realize the root cause may be hard to track down; the problem may be either at the caller or at the callee, or somewhere in between - given that the network calls in Azure go through several layers of software load-balancing, each of those could cause an issue. I don't think this is specifically a bug in the .Net HttpClient.

What I would like is to mitigate for these errors effectively, and that, I think, requires two things:

  1. An effective way to detect whether a TaskCanceledException or OperationCanceledException is caused by failure to establish a connection with the target vs. the target taking too long to respond to the http request. This would enable the caller to retry on connection failures only. The "gateway" caller can't retry when requests time out because the server takes take too long to process.
  2. A way for the caller to configure an independent timeout for establishing a connection vs. the current timeout which applies to the request as a whole. The caller would use this timeout to limit the time spent waiting for a connection, rather than timing out when the http request times out, which by default is the awfully long 100 seconds.

I considered port exhaustion on the caller side but that seems unlikely - those failures are rare, and they never come in "clusters", they are isolated while traffic flows just fine around them. For example, in the last 24 hours the "gateway" deployed in one busy Azure region have correctly handled 4.9 million requests (evenly split across 3 nodes) while failing to connect only 650 times (also split across nodes) - to various destinations. @karelz - This also makes it next to impossible to run any packet capture software to identify the failing connections, but I'm running against the limits of my knowledge here, there surely are tools I'm not aware of.

@wfurt - are you suggesting the caller switch from using HttpClientHandler to using SocketsHttpHandler? HttpClientHandler delegates work internally to SocketsHttpHandler, but it looks like some of the properties of SocketsHttpHandler are not exposed or used. The latter has a ConnectTimeout properties which seems to be meant for controlling how long the handler will wait for a connection to be established. Any chance this could be exposed in future versions of HttpClientHandler?

@wfurt - How would ConnectCallback help? I couldn't find any meaningful examples of how to use it to control how connections are established. Would a connection callback receive any meaningful information for such timeouts?

Thanks,
Fritz

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info labels Jan 13, 2022
@stephentoub
Copy link
Member

A way for the caller to configure an independent timeout for establishing a connection vs. the current timeout which applies to the request as a whole. The caller would use this timeout to limit the time spent waiting for a connection, rather than timing out when the http request times out, which by default is the awfully long 100 seconds.

This exists:
https://docs.microsoft.com/en-us/dotnet/api/system.net.http.socketshttphandler.connecttimeout?view=net-6.0

@wfurt
Copy link
Member

wfurt commented Jan 13, 2022

My thought was to use the connect callback to log IP info so it is easier to correlate with packet captures. If packet captures are not possible, there may not be great benefit.
You could possibly write wrapper stream to track IO. Since the timeout seems to happen in initial handshake you can probably track time on first few data chunks to see if you getting responses from remote peer. The whole idea of packet capture is to determine if the timeouts are external or if this is caused by something inside SslStream or HttpClient.

And yes, I realized that while HttpClientHandler really is SocketsHttpHandler not all properties are exposed. We did talk about it in the past but HttpClientHandler remains generic API across all platforms while SocketsHttpHandler have some more knobs to play with. In order to use them, you you would have to use SocketsHttpHandler directly.

@fstugren
Copy link
Author

fstugren commented Jan 13, 2022

Thanks for your replies.

Considering the rather limited impact of this issue, probably the best way to handle it for now is to switch to using SocketsHttpHandler and configure the connection timeout as @stephentoub suggested.
What exception is thrown when the SocketsHttpHandler hits the connection timeout? Does the exception type, properties or internal exception help the caller determine that it was thrown as a result of a connection timeout, so it won't be confused with a slow server timeout? The behavior is not documented publicly.

Looks like the code that consumes SocketsHttpHandler.ConnectTimeout is in the HttpConnectionPool in the same repo. Although I'd prefer better public documentation on the behavior of those classes, that's the next best thing.

Other than mitigating with retries how does one effectively determine the cause of such sporadic connection failures? We've had a related issue yesterday where of our internal services (.Net/Kestrel running on Service Fabric cluster in Azure) stopped receiving calls altogether for 10 minutes in one region. The "gateway" service described here was getting the same connection timeouts, but so did other services which were calling that service; this points to some issue at the target, but investigations (after connectivity recovered without intervention) on that service's nodes found nothing to work with. Can the .Net http client handler possibly be enhanced to include diagnostics for failed connections in those situations or is it operating at too high a level to be useful for debugging?

@fstugren
Copy link
Author

fstugren commented Jan 13, 2022

Follow-up - I tested a bit with SocketsHttpHandler, triggering a connection timeout in the caller by delaying the connection completion on the Kestrel server using a sleep in the ServerCertificateSelector callback. The good news is that the ConnectTimeout value is respected. However, unfortunately the exception is a run of the mill OperationCanceledException with no inner exception and the same callstack I posted in my original message.

It looks like the only way for the client to figure out if the exception is a connection exception is to search the exception's call stack for strings like "EstablishSslConnectionAsyncCore", "ConnectAsync", "CreateHttp11ConnectionAsync", "GetHttpConnectionAsync", but that seems extremely hacky and not very reliable.

Any other ideas? Supplying a custom ConnectCallback to replace the default does not help, as the exception is not thrown during Socket.SonnectAsync but is most often seen in ConnectHelper.EstablishSslConnectionAsyncCore, which is called after the socket connection is established.

@fstugren
Copy link
Author

fstugren commented Jan 20, 2022

Is it at least possible to make a change to indicate there was a timeout during the SSL negotiation? Adding a detailed inner exception to the OperationCanceledException would help.

The fact that the SSL negotiation usually causes this failure narrows down the field. It is possible that either the client or the service has trouble finding a certificate or retrieving the certificate keys to negotiate the SSL handshake. I am able to generate this very error in the client by adding a delay in the server's ServerCertificateSelector callback (Kestrel-specific). While some of the internal services called by the "gateway" which run into this error use a ServerCertificateSelector to pick the server's certificate, not all the services do. I can't find any connection-related code that may cause a deadlock on the server side.

Several of the internal services the Gateway calls are consistently the biggest "offenders" causing this error. They are all running .Net Core+Kestrel on Linux/AKS, whereas the service with the highest traffic (orders of magnitude higher than the others) sees the fewest SSL negotiation errors and runs Java on AKS/Linux. The gateway service calls all services using the same code, so this seems to indicate that the issue lies with the services, not the caller. This is just speculation for now.

@fstugren
Copy link
Author

fstugren commented Jan 24, 2022

Another update - I deployed a fix along the lines of what was described earlier in this thread, using a SocketsHttpHandler connection timeout of 10 seconds in our pre-production environment. I'm happy to see that since Friday morning 100% of SSL connection failures caused by timeouts succeeded on retry. We'll be deploying this in production sometime later this week.

What's the optimal maximum time to wait for an SSL connection to complete anyway? 10 seconds was just an arbitrary warm and fuzzy number I picked, but it seems too high.

However, two things still remain:

  1. We still don't know the root cause of those timeouts to negotiate SSL connections
  2. I'd greatly appreciate a more reliable solution, because I don't want to write hacks like this:
public static bool IsConnectionTimeout(this Exception ex)
{
    return (ex is TaskCanceledException || ex is OperationCanceledException) &&
        ex.StackTrace.Contains("HttpConnectionPool.GetHttpConnectionAsync", StringComparison.Ordinal) &&
        ex.StackTrace.Contains("HttpConnectionPool.ConnectAsync", StringComparison.Ordinal) &&
        ex.StackTrace.Contains("ConnectHelper.EstablishSslConnectionAsyncCore", StringComparison.Ordinal) &&
        ex.StackTrace.Contains("SslStream.ForceAuthenticationAsync", StringComparison.Ordinal) ||
        ex.InnerException != null && ex.InnerException.IsConnectionTimeout();
}

A better solution can be achieved by either one of those:

  • A dedicated connection timeout exception or inner exception for the OperationCanceledException (preferred)
    or
  • Allow the connection callback to handle any connection errors resulting from the SSL negotiation, in which case a dedicated exception won't be necessary, since the caller would know the error context.

@wfurt - does the issue you linked above cover this latter proposal?

@karelz
Copy link
Member

karelz commented Jan 25, 2022

Triage: We believe there is similar issue already existing (to provide enum about error causing request failure) -- we should duplicate it against it.

@ManickaP
Copy link
Member

I found the issue I had in mind: #47484
Before .NET 6.0, we'd throw a generic TaskCancelledException without any inner exception in case of connection timeout due to ConnectTimeout. That got fixed in 6.0 with #53851.
Seeing from the description this is ASP.NET 5.

A dedicated connection timeout exception or inner exception for the OperationCanceledException (preferred)

@fstugren I believe that what was done in the aforementioned PR is exactly what you're asking for. Could you please confirm that?

Also, I believe that @wfurt PR #63851 allowing to create SslStream in ConnectCallback will give you complete control over connection creation.

So, is there anything else you'd like to see in HttpClient or is everything covered by those 2 PRs?

@fstugren
Copy link
Author

@ManickaP, thanks for following up. I am not yet able to upgrade our services from .Net 5 to .Net 6 to validate this; I may be able to do that in the next couple of months. Also, we notice both TaskCanceledException and OperationCanceledException in these situations; the call stacks look slightly different. Both of those errors have a nearly 100% rate of success on retry.

Otherwise, going beyond the client-side handling of those errors, I'd be very interested to find the root cause of those hangs. Based on observations there seems to be a correlation between this client-side error and having Kestrel at the other end, but that's still just speculation at the moment.

@ManickaP
Copy link
Member

@fstugren you could turn on networking telemetry https://devblogs.microsoft.com/dotnet/net-5-new-networking-improvements/#telemetry, it should show you how long does each part of establishing connection take. And obviously turn on server-side logging if you can, that might shed some light as well.

Let me know if you need more help. Otherwise, I don't think there's anything actionable for us at the moment.

@ManickaP
Copy link
Member

ManickaP commented Mar 8, 2022

Triage: closing now, all should be addressed now, partially in 6.0 partially in main (7.0). Please feel free to re-open or open a new one if you find any more issues in your investigations.

@ManickaP ManickaP closed this as completed Mar 8, 2022
@ManickaP ManickaP removed untriaged New issue has not been triaged by the area owner needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Mar 8, 2022
@ManickaP ManickaP added this to the 7.0.0 milestone Mar 8, 2022
@fstugren
Copy link
Author

fstugren commented Apr 5, 2022

@ManickaP - can you clarify how the change you outlined above solves the problem described here? I looked at the Github PR you sent but I am not familiar enough with the asp.net internals to understand how I would apply it.
So far, all I've done is to replace the HttpClientHandler with SocketsHttpHandler so I can control the connection timeout.

Ideally, the caller would be to be able to tell that a timeout was caused by a failure to establish a connection rather than the server being too slow to process the request. This is currently not possible, except through hacks like the one I outlined above.

Do you have an example for how to handle connection errors?

@ManickaP
Copy link
Member

ManickaP commented Apr 6, 2022

This method https://source.dot.net/#System.Net.Http/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs,2db3901400a1bccd,references was added, which adds inner TimeoutException with this particular message: https://source.dot.net/#System.Net.Http/System.SR.cs,252b047fa960992e,references to distinguish connection timeout from a generic OperationCanceledException.

So this should give you enough info to discern cancellation from timeout and the type of the timeout, without checking the exception call stack. Does this answer your question?

@fstugren
Copy link
Author

fstugren commented Apr 9, 2022

@ManickaP, @karelz, @stephentoub - I've given the latest .Net 6.0 a try and here's what I saw when the caller catches a connect timeout exception triggered by a call to HttpClient.SendAsync:

exception.GetType().Name = TaskCanceledException
exception.Message = "The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing."

This message is inaccurate - yes, the client default timeout was 100 seconds, but the connection timeout set on the SocketsHttpHandler was 10 seconds. The timeout did not happen because 100 seconds elapsed. It looks like the code that throws makes an assumption without being aware of the connect timeout.

exception.InnerException.GetType().Name = TimeoutException
exception.InnerException.Message = "The operation was canceled." --> generic timeout message, not very useful

exception.InnerException.InnerException.GetType().Name = TaskCanceledException
exception.InnerException.InnerException.Message = "The operation was canceled."

exception.InnerException.InnerException.InnerException.GetType().Name = TimeoutException
exception.InnerException.InnerException.InnerException.Message = "A connection could not be established within the configured ConnectTimeout." --> finally the connection-specific timeout message!

The exception that indicates a connect timeout is nested three levels deep, at least in the case I tested for. Is this the intended behavior? It's a bit awkward. Caller must recursively dig into inner exceptions until they find one with the message "A connection could not be established..." A dedicated "ConnectTimeoutException" would have offered the caller an unambiguous indication of what happened. Anyway, still better than searching through the stack trace.

Thanks
Fritz

@MihaZupan
Copy link
Member

No, the way a ConnectTimeout bubbles up is not intentional.
Tracked by #67505

@ghost ghost locked as resolved and limited conversation to collaborators May 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants