Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dns.GetHostEntry(Dns.GetHostName()) throws ExtendedSocketException when ifconfig succeeds #37766

Closed
rjmholt opened this issue Jun 11, 2020 · 16 comments
Labels
Milestone

Comments

@rjmholt
Copy link

rjmholt commented Jun 11, 2020

Description

From PowerShell/PowerShell#12935.

I've looked through #29780, and this looks at least related to that (but is closed).

PowerShell's (Azure DevOps) CI on macOS has been failing intermittently this week in a test setup codeblock that looks like this:

$hostName = [System.Net.Dns]::GetHostName() 
...
$hostEntry = [System.Net.Dns]::GetHostEntry($hostName)

The actual code is here

Trying to mitigate this in a PR, we retry several times and then fall back to ifconfig, which successfully resolves the address.

However, after that, our tests start failing due to failures in Ping.SendPingAsync here, with the error message Testing connection to computer 'Mac-1467.local' failed: Cannot resolve the target name. (I'd like to get a full stack trace, but I have yet to work out how with our testing infrastructure).

This suggests that .NET's DNS resolution is failing when macOS' builtins are succeeding.

Configuration

  • .NET Version: .NET 5, (SDK 5.0.100-preview.5.20279.10)
  • OS: macOS 10.14
  • Arch: x64

This issue only appears on macOS.

Regression?

From discussion in our issue, we've had issues on and off with System.Net.NetworkInformation.Ping functionality in the past, but currently its failing most of the time in our CI.

Other information

After reading through #29780, I saw from @wfurt's comment that the OS sometimes fails the DNS resolution request and that a retry is necessary. However, in our case we retry 5 times and fall back to ifconfig. Our CI is showing that the retries all fail and ifconfig succeeds.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Jun 11, 2020
@ghost
Copy link

ghost commented Jun 11, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@wfurt
Copy link
Member

wfurt commented Jun 11, 2020

what do you mean by 'ifconfig'?
When I look it it while back, I could not reproduce it locally.
That makes CI environment highly suspect IMHO.
Did you try to look through system logs or do you not have access to them?

@wfurt
Copy link
Member

wfurt commented Jun 11, 2020

BTW There is some work @antonfirsov was planning for #36849

Do you see failures other than hostname @rjmholt ?

@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

what do you mean by 'ifconfig'?

The ifconfig bin util, which is easy to call since it's just a command to PowerShell.

As I'm using it in our CI environment

That makes CI environment highly suspect IMHO.

I agree with you that the environment is fairly suspect here, but I'm conscious that a different tool is giving us a successful result, although perhaps it isn't obtaining it the same way.

Do you see failures other than hostname @rjmholt ?

Yeah, in our call to SendPingAsync. I'm working on getting you a proper stacktrace. Hopefully won't take too long -- just waiting on CI to run.

@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

Did you try to look through system logs or do you not have access to them?

Haven't tried that yet (didn't realise it might contain more info) -- but will investigate now.

@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

BTW There is some work @antonfirsov was planning for #36849

Ah, yeah we are trying to resolve on our own hostname. But we correctly get the hostname back, and then fall down when we try and resolve an address from it

@wfurt
Copy link
Member

wfurt commented Jun 11, 2020

so you call ifconfig and you parse text to get address of the interface(s)?
You can get equivalent via

NetworkInterface[] nics = NetworkInterface.GetAllNetworkInterfaces();

and drilling to the IPv4 properties.

The logs may or may not have info. Looks or anything about mDns or discoveryd.
There is fair amount of chatter about DNS stability on OSX (various OS X versions)
You can try to run scutil --dns to get some system setting info.
Do do resolution, we call POSIX getaddrinfo().

If you primarily care about own hostname, #36849 will fix it for you.

@wfurt wfurt added the os-mac-os-x macOS aka OSX label Jun 11, 2020
@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

Gotcha -- thanks for the help. I'll work on digging more information out of the CI environment to better characterise what's going on.

As you say, entirely likely that our CI environment is at fault. But the information you've given me has been very helpful in any event.

@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

Here's the full stack trace:

Exception             : 
    Type           : System.Net.NetworkInformation.PingException
    Message        : Testing connection to computer 'Mac-1467.local' failed: Cannot resolve the target name.
    InnerException : 
        Type            : System.Net.Internals.SocketExceptionFactory+ExtendedSocketException
        Message         : nodename nor servname provided, or not known
        SocketErrorCode : HostNotFound
        ErrorCode       : -131073
        NativeErrorCode : -131073
        TargetSite      : 
            Name          : GetHostEntryOrAddressesCore
            DeclaringType : System.Net.Dns
            MemberType    : Method
            Module        : System.Net.NameResolution.dll
        StackTrace      : 
   at System.Net.Dns.GetHostEntryOrAddressesCore(String hostName, Boolean justAddresses)
   at System.Net.Dns.<>c.<GetHostEntryOrAddressesCoreAsync>b__27_3(Object s)
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at Microsoft.PowerShell.Commands.TestConnectionCommand.GetCancellableHostEntry(String targetNameOrAddress) in /Users/runner/runners/2.170.1/work/1/s/src/Microsoft.PowerShell.Commands.Management/commands/management/TestConnectionCommand.cs:line 777
   at Microsoft.PowerShell.Commands.TestConnectionCommand.TryResolveNameOrAddress(String targetNameOrAddress, String& resolvedTargetName, IPAddress& targetAddress) in /Users/runner/runners/2.170.1/work/1/s/src/Microsoft.PowerShell.Commands.Management/commands/management/TestConnectionCommand.cs:line 705
        Source          : System.Net.NameResolution
        HResult         : 5
    HResult        : -2146233079
TargetObject          : Mac-1467.local
CategoryInfo          : ResourceUnavailable: (Mac-1467.local:String) [Test-Connection], PingException
FullyQualifiedErrorId : TestConnectionException,Microsoft.PowerShell.Commands.TestConnectionCommand
InvocationInfo        : 
    MyCommand        : Test-Connection
    ScriptLineNumber : 279
    OffsetInLine     : 23
    HistoryId        : 6
    ScriptName       : /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1
    Line             : $result = Test-Connection $hostName -MtuSize
                       
    PositionMessage  : At /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1:279 char:23
                       +             $result = Test-Connection $hostName -MtuSize
                       +                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PSScriptRoot     : /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management
    PSCommandPath    : /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1
    InvocationName   : Test-Connection
    CommandOrigin    : Internal
ScriptStackTrace      : at <ScriptBlock>, /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1: line 279
                        at Invoke-Test, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/It.ps1: line 289
                        at ItImpl, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/It.ps1: line 241
                        at It, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/It.ps1: line 132
                        at <ScriptBlock>, /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1: line 278
                        at DescribeImpl, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/Describe.ps1: line 213
                        at Context, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/Context.ps1: line 92
                        at <ScriptBlock>, /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1: line 275
                        at DescribeImpl, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/Describe.ps1: line 213
                        at Describe, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Functions/Describe.ps1: line 105
                        at <ScriptBlock>, /Users/runner/runners/2.170.1/work/1/s/test/powershell/Modules/Microsoft.PowerShell.Management/Test-Connection.Tests.ps1: line 56
                        at <ScriptBlock>, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Pester.psm1: line 1111
                        at Invoke-Pester<End>, /Users/runner/runners/2.170.1/work/1/a/bins/publish/Modules/Pester/4.10.1/Pester.psm1: line 1137
                        at <ScriptBlock>, <No file>: line 1
PipelineIterationInfo : 

@rjmholt
Copy link
Author

rjmholt commented Jun 11, 2020

So it does look like all our failures are attributable to the same issue in resolving our own hostname.

I'll give #36849 a read, and imagine we can mark this issue as a duplicate.

@antonfirsov
Copy link
Member

@rjmholt does hostname | nslookup succeed in bash?

@rjmholt
Copy link
Author

rjmholt commented Jun 12, 2020

@rjmholt does hostname | nslookup succeed in bash?

Ah, that's a good idea! I just tested it and it doesn't resolve anything. I'll close this issue then.

Thanks all for your help!

@rjmholt rjmholt closed this as completed Jun 12, 2020
@wfurt
Copy link
Member

wfurt commented Jun 12, 2020

it would be still nice IMHO to investigate and improve CI. We were not able so far to get grasp on what is causing this. (as some systems do and some don't)

@rjmholt
Copy link
Author

rjmholt commented Jun 13, 2020

it would be still nice IMHO to investigate and improve CI

Yes, I'm continuing to follow up in actions/runner-images#1042. I'm also converting our tests to not depend on DNS resolution, but I'm still hoping to work out why this is failing on macOS.

@antonfirsov
Copy link
Member

@rjmholt on macOS, it depends on the version. In dotnet/runtime's CI 10.13 works, while 10.14 doesn't. If you find out what's causing this difference let us know!

Reproduction on Ubuntu is quite simple, see #36849.

@rjmholt
Copy link
Author

rjmholt commented Jun 15, 2020

From actions/runner-images#1042 (comment):

We have disabled /usr/bin/defaults write /Library/Preferences/com.apple.mDNSResponder.plist NoMulticastAdvertisements -bool true multicast advertisement service to prevent window pop-up "the name of your computer is already in use on this network mac", which is the root cause of this issue.

@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants