Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux musl x64 build break apparently due to some libc version mismatch (high priority as it is blocking clean CI) #84949

Closed
trylek opened this issue Apr 17, 2023 · 17 comments · Fixed by #84952
Assignees
Labels
area-Infrastructure-coreclr blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs

Comments

@trylek
Copy link
Member

trylek commented Apr 17, 2023

Platform: Linux musl x64
Example run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=241777&view=logs&j=92e17f34-a5d5-5d2c-e1e5-826eb9786835&t=8b99f496-81fe-5fa6-0e3e-125347621b1c

Diagnostics:

  249 / 264 (94%, 249 failed): failed in 667 msecs, exit code 134, expected 0: dotnet /__w/1/s/artifacts/bin/coreclr/linux.x64.Checked/crossgen2/crossgen2.dll @/__w/1/s/artifacts/tests/coreclr/obj/linux.x64.Checked/crossgen.out/System.Runtime.Loader.dll.rsp
  !! Unhandled exception. System.DllNotFoundException: Unable to load shared library 'jitinterface_x64' or one of its dependencies. In order to help diagnose loading problems, consider using a tool like strace. If you're using glibc, consider setting the LD_DEBUG environment variable: 
  !! /__w/1/s/.dotnet/shared/Microsoft.NETCore.App/8.0.0-preview.3.23174.8/jitinterface_x64.so: cannot open shared object file: No such file or directory
  !! /__w/1/s/artifacts/bin/coreclr/linux.x64.Checked/crossgen2/jitinterface_x64.so: cannot open shared object file: No such file or directory
  !! /__w/1/s/.dotnet/shared/Microsoft.NETCore.App/8.0.0-preview.3.23174.8/libjitinterface_x64.so: cannot open shared object file: No such file or directory
  !! libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory
  !! /__w/1/s/.dotnet/shared/Microsoft.NETCore.App/8.0.0-preview.3.23174.8/jitinterface_x64: cannot open shared object file: No such file or directory
  !! /__w/1/s/artifacts/bin/coreclr/linux.x64.Checked/crossgen2/jitinterface_x64: cannot open shared object file: No such file or directory
  !! /__w/1/s/.dotnet/shared/Microsoft.NETCore.App/8.0.0-preview.3.23174.8/libjitinterface_x64: cannot open shared object file: No such file or directory
  !! /__w/1/s/artifacts/bin/coreclr/linux.x64.Checked/crossgen2/libjitinterface_x64: cannot open shared object file: No such file or directory
  !!    at System.Runtime.InteropServices.NativeLibrary.LoadByName(String libraryName, QCallAssembly callingAssembly, Boolean hasDllImportSearchPathFlag, UInt32 dllImportSearchPathFlag, Boolean throwOnError)
  !!    at System.Runtime.InteropServices.NativeLibrary.LoadLibraryByName(String libraryName, Assembly assembly, Nullable`1 searchPath, Boolean throwOnError)
  !!    at Internal.JitInterface.JitConfigProvider.<>c__DisplayClass5_0.b__0(String libName, Assembly assembly, Nullable`1 searchPath) in /_/src/coreclr/tools/Common/JitInterface/JitConfigProvider.cs:line 59
  !!    at System.Runtime.InteropServices.NativeLibrary.LoadLibraryCallbackStub(String libraryName, Assembly assembly, Boolean hasDllImportSearchPathFlags, UInt32 dllImportSearchPathFlags)
  !!    at Internal.JitInterface.CorInfoImpl.GetJitHost(IntPtr configProvider)
  !!    at Internal.JitInterface.CorInfoImpl.Startup(CORINFO_OS os) in /_/src/coreclr/tools/Common/JitInterface/CorInfoImpl.cs:line 179
  !!    at Internal.JitInterface.JitConfigProvider.Initialize(TargetDetails target, IEnumerable`1 jitFlags, IEnumerable`1 parameters, String jitPath) in /_/src/coreclr/tools/Common/JitInterface/JitConfigProvider.cs:line 64
  !!    at ILCompiler.ReadyToRunCodegenCompilationBuilder.ToCompilation() in /_/src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/ReadyToRunCodegenCompilationBuilder.cs:line 296
  !!    at ILCompiler.Program.RunSingleCompilation(Dictionary`2 inFilePaths, InstructionSetSupport instructionSetSupport, String compositeRootPath, Dictionary`2 unrootedInputFilePaths, HashSet`1 versionBubbleModulesHash, ReadyToRunCompilerContext typeSystemContext) in /_/src/coreclr/tools/aot/crossgen2/Program.cs:line 613
  !!    at ILCompiler.Program.Run() in /_/src/coreclr/tools/aot/crossgen2/Program.cs:line 289
  !!    at ILCompiler.Crossgen2RootCommand.<>c__DisplayClass187_0.<.ctor>b__0(InvocationContext context) in /_/src/coreclr/tools/aot/crossgen2/Crossgen2RootCommand.cs:line 293
  !!    at System.CommandLine.Invocation.AnonymousCommandHandler.Invoke(InvocationContext context)
  !!    at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<b__0>d.MoveNext()
  !! --- End of stack trace from previous location ---
  !!    at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<b__0>d.MoveNext()
  !! --- End of stack trace from previous location ---
  !!    at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass11_0.<b__0>d.MoveNext()
  !! --- End of stack trace from previous location ---
  !!    at System.CommandLine.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<b__0>d.MoveNext()
  !! --- End of stack trace from previous location ---
  !!    at System.CommandLine.Invocation.InvocationPipeline.g__FullInvocationChain|3_0(InvocationContext context)
  !!    at System.CommandLine.Invocation.InvocationPipeline.Invoke(IConsole console)
  !!    at System.CommandLine.Parsing.ParseResultExtensions.Invoke(ParseResult parseResult, IConsole console)
  !!    at System.CommandLine.Parsing.ParserExtensions.Invoke(Parser parser, String[] args, IConsole console)
  !!    at ILCompiler.Program.Main(String[] args) in /_/src/coreclr/tools/aot/crossgen2/Program.cs:line 890

/cc @janvorli @sbomer @dotnet/runtime-infrastructure @dotnet/crossgen-contrib

Known Issue Error Message

Fill the error message using known issues guidance.

{
  "ErrorMessage": "Unable to load shared library 'jitinterface_x64'",
  "BuildRetry": false
}

Report

Build Definition Step Name Console log Pull Request
241777 dotnet/runtime Generate CORE_ROOT Log
241282 dotnet/runtime Generate CORE_ROOT Log
241126 dotnet/runtime Generate CORE_ROOT Log
241070 dotnet/runtime Generate CORE_ROOT Log #84904

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
3 4 4
@trylek trylek added area-Infrastructure-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' untriaged New issue has not been triaged by the area owner labels Apr 17, 2023
@janvorli
Copy link
Member

The error sounds as if we attempted to load MUSL version of the libjitinterface_x64.so on a glibc distro.

@janvorli
Copy link
Member

@sbomer It looks like the cross build is attempting to use crossgen2 with MUSL version of JIT native libraries while it should use the glibc ones.

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

@janvorli - I see, so am I right to understand that in this particular case HostArchitecture=Linux whereas TargetArchitecture=Linux_musl and some code publishing the JIT native libraries uses the TargetArchitecture version instead of the HostArchitecture which should be used instead as that's the OS on which the build is running?

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

/cc @dotnet/jit-contrib

@sbomer sbomer self-assigned this Apr 17, 2023
@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

I see this in the logs:

--crossgen2-path "/__w/1/s/artifacts/bin/coreclr/linux.x64.Checked/crossgen2/crossgen2.dll"

The musl x64 build changed to become a cross-build (build on x64 glibc, targeting x64 musl), so I think this needs to be changed to use the x64 glibc hosted crossgen2 that gets built as part of the runtime cross components build. I think we need to address this TODO:

<CrossgenDir>$([System.IO.Path]::Combine($(_CoreclrPkgDir),"tools"))</CrossgenDir>
<!-- TODO override with rid specific tools path for x-arch -->
<Crossgen>$([System.IO.Path]::Combine($(CrossgenDir),"crossgen"))</Crossgen>

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

For example here's how we handle this when we crossgen corelib:

<CrossDir Condition="'$(CrossBuild)' == 'true' or '$(BuildArchitecture)' != '$(TargetArchitecture)'">$(BuildArchitecture)</CrossDir>

<CrossGenDllCmd>$(DotNetCli) $([MSBuild]::NormalizePath('$(BinDir)', '$(CrossDir)', 'crossgen2', 'crossgen2.dll'))</CrossGenDllCmd>

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

I will try to reproduce this locally, but if someone is already set up to run the R2R outerloop tests locally, it might help us to fix and validate this faster.

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

Actually, I think the place that needs to be fixed up is

<CrossgenDir>$(__BinDir)</CrossgenDir>
<CrossgenDir Condition="'$(TargetArchitecture)' == 'arm'">$(CrossgenDir)\x64</CrossgenDir>
<CrossgenDir Condition="'$(TargetArchitecture)' == 'arm64'">$(CrossgenDir)\x64</CrossgenDir>
<CrossgenDir Condition="'$(TargetArchitecture)' == 'x86'">$(CrossgenDir)\x64</CrossgenDir>
(not the Crossgen2Tasks targets)

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

Overall I think this is the right place to fix this as this constructs the path to Crossgen2 for compiling the framework but I'm still somewhat struggling with understanding how this pre-existing cross-architecture scheme translates into cross-OS-variation build. If I'm right to understand JanV's comments, at some point we're likely producing separate Linux-targeting vs. Linux-musl-targeting jitinterface and we're loading / using the wrong one. I'll need to remind myself how we're producing Crossgen2 versions targeting these OS variants before I can start thinking about a fix.

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

I have asked @davidwrighton offline for help as I believe he wrote most of these scripts, I'll continue working on this tomorrow if we don't nail it down before EOD today. One thing I'm still missing is why has this blown up right now considering we've been running Linux-musl tests for years as far as I recall?

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

How we produce crossgen2 targeting those OS variants changed in #84148. Before that change, the x64 musl build was done on a musl build machine, but now it is done on an x64 glibc build machine, with an x64 musl rootfs.

For musl x64, the TargetArchitecture and BuildArchitecture are both x64. I believe the fix will look something like #84952, but I haven't validated it yet.

why has this blown up right now considering we've been running Linux-musl tests for years as far as I recall?

It was caused recently by #84148. I think we just didn't catch it in PR because these jobs don't run there.

@janvorli
Copy link
Member

why has this blown up right now

Because we've never cross built x64 musl on x64 glibc before

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

I see, thanks for explaining.

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

Ok, I was able to see the failure locally, and #84952 fixes it. After the fix, I see output like:

  Compiling framework using Crossgen2: "/runtime/dotnet.sh" "/runtime/artifacts/tests/coreclr/linux.x64.Checked/Tests/Core_Root/R2RTest/R2RTest.dll" compile-framework -cr "/runtime/artifacts/tests/coreclr/linux.x64.Checked/Tests/Core_Root" --output-directory "/runtime/artifacts/tests/coreclr/obj/linux.x64.Checked/crossgen.out" --release --nocleanup --target-arch x64 -dop 8 -m "/runtime/artifacts/tests/coreclr/linux.x64.Checked/Tests/Core_Root/StandardOptimizationData.mibc" --crossgen2-parallelism 1 --verify-type-and-field-layout --crossgen2-path "/runtime/artifacts/bin/coreclr/linux.x64.Checked/x64/crossgen2/crossgen2.dll"
  Deleted /runtime/artifacts/tests/coreclr/obj/linux.x64.Checked/crossgen.out in 10 msecs
  Using dotnet: /runtime/.dotnet/dotnet
  1 / 264 (0%, 0 failed): launching: /runtime/.dotnet/dotnet /runtime/artifacts/bin/coreclr/linux.x64.Checked/x64/crossgen2/crossgen2.dll @/runtime/artifacts/tests/coreclr/obj/linux.x64.Checked/crossgen.out/System.Private.CoreLib.dll.rsp
  2 / 264 (0%, 0 failed): launching: /runtime/.dotnet/dotnet /runtime/artifacts/bin/coreclr/linux.x64.Checked/x64/crossgen2/crossgen2.dll @/runtime/artifacts/tests/coreclr/obj/linux.x64.Checked/crossgen.out/FSharp.Core.dll.rsp

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

By the way, do we know why the PR run on #84148 didn't catch this while obviously subsequent PR / CI runs have been hitting this? Three years ago, Santi wrote this filtering logic so that we decide what tests to run based on the extent of a PR, is there a mapping we're missing in the vein of, if we're touching this bit of code, we need to rerun these tests?

@sbomer
Copy link
Member

sbomer commented Apr 17, 2023

It looks like it's because these jobs only run as part of the outerloop pipeline which doesn't get triggered by default in PRs. Maybe the PR jobs should do a pri0 R2R test build as well? Although it would still not have caught the bug unless we included musl x64 in the matrix. Hard to say what exactly the right balance is.

@trylek
Copy link
Member Author

trylek commented Apr 17, 2023

I see, thanks for explaining. At some point I was convinced I was seeing these failures in the CI runs but you're right it doesn't make sense, we're only running the R2R legs in outerloop. It would be nice to add at least minimum coverage to PR but the dnceng team used to be strongly opposed to that as PR costs are already too high, perhaps if we manage to save some time elsewhere we might be able to do that.

@trylek trylek added blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs and removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' labels Apr 17, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Apr 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr blocking-outerloop Blocking the 'runtime-coreclr outerloop' and 'runtime-libraries-coreclr outerloop' runs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants