Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create CoreCLR, HRESULT: 0x8007000E on linux-musl-x86 #83509

Open
am11 opened this issue Mar 16, 2023 · 15 comments
Open

Failed to create CoreCLR, HRESULT: 0x8007000E on linux-musl-x86 #83509

am11 opened this issue Mar 16, 2023 · 15 comments
Labels
arch-x86 area-PAL-coreclr os-linux Linux OS (any supported distro)
Milestone

Comments

@am11
Copy link
Member

am11 commented Mar 16, 2023

From @ayakael #77667 (comment)

I'm getting an execution failure when trying to run the resulting SDK. Running dotnet --info with the following SDK in linux-musl-x86 chroot yields a freeze, and then eventually a Failed to create CoreCLR, HRESULT: 0x8007000E. It seems like CoreCLR tries to allocate all of the available memory for some reason. @am11 I see that you got the build-rootfs.sh script working for linux-musl-x86, have you managed a successful crossbuild to linux-musl-x86?

I tried stracing it, but could only attach the process after execution.
strace.txt

Edit: The following patchset is used to build the SDK: https://lab.ilot.io/ayakael/dotnet-stage0

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 16, 2023
@am11
Copy link
Member Author

am11 commented Mar 16, 2023

@ayakael, I have only tried building runtime, didn't test the binaries.

Running dotnet --info with the following SDK

I don't have permission to download it. :)

Failed to create CoreCLR, HRESULT: 0x8007000E

BTW, were you testing under VM / baremetal install or QEMU? CoreCLR-PAL has some issues with QEMU (even on 64-bit archs).

@am11
Copy link
Member Author

am11 commented Mar 16, 2023

From @ayakael #77667 (comment)

Woops, here is a functional link: dotnet-sdk-8.0.100-preview.2.23157.25-r1-linux-musl-x86.tar.xz

I tested first within a chroot of alpine-x86 that was in LXC, and then within a chroot in a qemu based env. Both behave the same.

@am11
Copy link
Member Author

am11 commented Mar 16, 2023

Thanks @ayakael. I created a HyperV VM on Windows x64 and installed alpine 32 bit to rule out the emulation factor:

dotnet --info with your SDK build is failing in this VM as well.

@janvorli, this seems like a memory mapping issue on 32-bit Alpine, where DOTNET_EnableWriteXorExecute=0 and/or DOTNET_GCHeapHardLimit=... do not help.

strace -f dotnet --info -> http://sprunge.us/S9saN5

$ cat /proc/meminfo
MemTotal:         308524 kB
MemFree:          218152 kB
MemAvailable:     108400 kB
Buffers:            1020 kB
Cached:            29348 kB
SwapCached:          180 kB
Active:             5804 kB
Inactive:          21904 kB
Active(anon):         68 kB
Inactive(anon):      344 kB
Active(file):       5736 kB
Inactive(file):    21560 kB
Unevictable:        3072 kB
Mlocked:               0 kB
HighTotal:        151072 kB
HighFree:          94044 kB
LowTotal:         157452 kB
LowFree:          124108 kB
SwapTotal:       4194300 kB
SwapFree:        4191476 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:           344 kB
Mapped:             2344 kB
Shmem:              3072 kB
KReclaimable:       7992 kB
Slab:              19800 kB
SReclaimable:       7992 kB
SUnreclaim:        11808 kB
KernelStack:         640 kB
PageTables:          284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4348560 kB
Committed_AS:       7956 kB
VmallocTotal:     122880 kB
VmallocUsed:       26060 kB
VmallocChunk:          0 kB
Percpu:              328 kB
DirectMap4k:       20472 kB
DirectMap4M:      868352 kB

$ ulimit -a
core file size (blocks)         (-c) 0
data seg size (kb)              (-d) unlimited
scheduling priority             (-e) 0
file size (blocks)              (-f) unlimited
pending signals                 (-i) 31256
max locked memory (kb)          (-l) 64
max memory size (kb)            (-m) unlimited
open files                      (-n) 1024
POSIX message queues (bytes)    (-q) 819200
real-time priority              (-r) 0
stack size (kb)                 (-s) 8192
cpu time (seconds)              (-t) unlimited
max user processes              (-u) 31256
virtual memory (kb)             (-v) unlimited
file locks                      (-x) unlimited

@ayakael
Copy link
Contributor

ayakael commented Mar 16, 2023

Thanks @ayakael. I created a HyperV VM on Windows x64 and installed alpine 32 bit to rule out the emulation factor:

dotnet --info with your SDK build is failing in this VM as well.

@janvorli, this seems like a memory mapping issue on 32-bit Alpine, where DOTNET_EnableWriteXorExecute=0 and/or DOTNET_GCHeapHardLimit=... do not help.

strace -f dotnet --info -> http://sprunge.us/S9saN5

$ cat /proc/meminfo
MemTotal:         308524 kB
MemFree:          218152 kB
MemAvailable:     108400 kB
Buffers:            1020 kB
Cached:            29348 kB
SwapCached:          180 kB
Active:             5804 kB
Inactive:          21904 kB
Active(anon):         68 kB
Inactive(anon):      344 kB
Active(file):       5736 kB
Inactive(file):    21560 kB
Unevictable:        3072 kB
Mlocked:               0 kB
HighTotal:        151072 kB
HighFree:          94044 kB
LowTotal:         157452 kB
LowFree:          124108 kB
SwapTotal:       4194300 kB
SwapFree:        4191476 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:           344 kB
Mapped:             2344 kB
Shmem:              3072 kB
KReclaimable:       7992 kB
Slab:              19800 kB
SReclaimable:       7992 kB
SUnreclaim:        11808 kB
KernelStack:         640 kB
PageTables:          284 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4348560 kB
Committed_AS:       7956 kB
VmallocTotal:     122880 kB
VmallocUsed:       26060 kB
VmallocChunk:          0 kB
Percpu:              328 kB
DirectMap4k:       20472 kB
DirectMap4M:      868352 kB

$ ulimit -a
core file size (blocks)         (-c) 0
data seg size (kb)              (-d) unlimited
scheduling priority             (-e) 0
file size (blocks)              (-f) unlimited
pending signals                 (-i) 31256
max locked memory (kb)          (-l) 64
max memory size (kb)            (-m) unlimited
open files                      (-n) 1024
POSIX message queues (bytes)    (-q) 819200
real-time priority              (-r) 0
stack size (kb)                 (-s) 8192
cpu time (seconds)              (-t) unlimited
max user processes              (-u) 31256
virtual memory (kb)             (-v) unlimited
file locks                      (-x) unlimited

Awesome, thank you for creating the issue. Note that this is the product of a crossbuild from Alpine x86_64 to Alpine x86 via a rootfs dir. I use this method to build the s390x and ppc64le bootstraps, so it is a tested methdology. But as usually crossbuilding to x86 is done via -m32 (which for some reason didn't work when attempted), maybe some bits are being built in 64-bit mode?

@ayakael
Copy link
Contributor

ayakael commented Mar 16, 2023

Turns out setting -m32 works, but makes no difference. Maybe relevant, without setting add_linker_flag(-Wl,-z,notext), I get the following error:

 ld.lld: error: relocation R_386_PC32 cannot be used against symbol 'CONTEXT_CaptureContext'; recompile with -fPIC
                     >>> defined in ../../../pal/src/libcoreclrpal.a(context2.S.o)
                     >>> referenced by context2.S:92 (/var/build/dotnet-stage0/src/dotnet-v8.0.100-preview.2/src/runtime/src/coreclr/pal/src/arch/i386/context2.S:92)
                     >>>               context2.S.o:(RtlCaptureContext) in archive ../../../pal/src/libcoreclrpal.a
                     clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
                     make[2]: *** [tools/superpmi/superpmi-shim-simple/CMakeFiles/superpmi-shim-simple.dir/build.make:425: tools/superpmi/superpmi-shim-simple/libsuperpmi-shim-simple.so] Error 1
                     make[1]: *** [CMakeFiles/Makefile2:4511: tools/superpmi/superpmi-shim-simple/CMakeFiles/superpmi-shim-simple.dir/all] Error 2
                     make[1]: *** Waiting for unfinished jobs....

@am11
Copy link
Member Author

am11 commented Mar 16, 2023

Turns out setting -m32 works

Nice, can we remove conditions around -m32 from #83464 since it doesn't break the build?

Another difference is linkage of libucontext library on linux-musl-x86 for libunwind, which we don't need on linux-musl-x64. But that too seem unrelated to memory map issue.

I think the issue is that PAL doesn't handle some case and we run into infinite recursion (the strace posted above is way too many > 10 MB of mremap / munmap lines which suggests lack of error handling somewhere in PAL). It could be that we can make PAL resilient against this system state as a general improvement. Similar issues are reported previously: #79612, #13027 etc. but their resolution is mostly about lifting virtual memory limit. In this case, we have that unlimited so it could be a bug in 32-bit specific branch code somewhere. I haven't investigated with debug build.

@ayakael
Copy link
Contributor

ayakael commented Mar 16, 2023

Turns out setting -m32 works

Nice, can we remove conditions around -m32 from #83464 since it doesn't break the build?

Another difference is linkage of libucontext library on linux-musl-x86 for libunwind, which we don't need on linux-musl-x64. But that too seem unrelated to memory map issue.

I think the issue is that PAL doesn't handle some case and we run into infinite recursion (the strace posted above is way too many > 10 MB of mremap / munmap lines which suggests lack of error handling somewhere in PAL). It could be that we can make PAL resilient against this system state as a general improvement. Similar issues are reported previously: #79612, #13027 etc. but their resolution is mostly about lifting virtual memory limit. In this case, we have that unlimited so it could be a bug in 32-bit specific branch code somewhere. I haven't investigated with debug build.

Yeah, we could. The following patch after toolchain changes works:

diff --git a/src/runtime/eng/common/cross/toolchain.cmake b/src/runtime/eng/common/cross/toolchain.cmake
index 2bb1e0845..fed3b0fad 100644
--- a/src/runtime/eng/common/cross/toolchain.cmake
+++ b/src/runtime/eng/common/cross/toolchain.cmake
@@ -284,8 +284,7 @@ elseif(TARGET_ARCH_NAME STREQUAL "x86")
     add_toolchain_linker_flag("--target=${TOOLCHAIN}")
     add_toolchain_linker_flag("-Wl,--rpath-link=${CROSS_ROOTFS}/usr/lib/gcc/${TOOLCHAIN}")
-  else()
-    add_toolchain_linker_flag(-m32)
   endif()
+  add_toolchain_linker_flag(-m32)
   if(TIZEN)
     add_toolchain_linker_flag("-B${CROSS_ROOTFS}/usr/lib/gcc/${TIZEN_TOOLCHAIN}")
     add_toolchain_linker_flag("-L${CROSS_ROOTFS}/lib")
@@ -324,9 +323,8 @@ if(TARGET_ARCH_NAME MATCHES "^(arm|armel)$")
 elseif(TARGET_ARCH_NAME STREQUAL "x86")
   if(EXISTS ${CROSS_ROOTFS}/usr/lib/gcc/i586-alpine-linux-musl)
     add_compile_options(--target=${TOOLCHAIN})
-  else()
-    add_compile_options(-m32)
   endif()
+  add_compile_options(-m32)
   add_compile_options(-Wno-error=unused-command-line-argument)
 endif()

edit Updated both PRs with above change

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Mar 17, 2023
@mangod9 mangod9 added this to the Future milestone Mar 17, 2023
@ayakael
Copy link
Contributor

ayakael commented Aug 24, 2023

Circling back on this, I now get the following error on 8.0.0-preview.7:

$ ./dotnet --info
 Fatal error. SSE is not supported on the processor.
 Aborted

@tannergooding
Copy link
Member

tannergooding commented Aug 24, 2023

.NET requires SSE/SSE2 and cmov (on x86/x64 based CPUs), which have been around for well over 20 years now and are guaranteed available on any 64-bit capable CPU (regardless of whether the OS is running as 32-bit or 64-bit).

@ayakael
Copy link
Contributor

ayakael commented Aug 24, 2023

.NET requires SSE/SSE2 and cmov (on x86/x64 based CPUs), which have been around for well over 20 years now and are guaranteed available on any 64-bit capable CPU (regardless of whether the OS is running as 32-bit or 64-bit).

Right, so why would running an x86 SDK in a x86 Alpine chroot on x86_64 hardware cause this error, when indeed the CPU has it?

@tannergooding
Copy link
Member

If you're running in an emulator, its possible it has it disabled. You could always manually check CPUID or cat /proc/cpuinfo or gcc -dumpmachine

@ayakael
Copy link
Contributor

ayakael commented Aug 24, 2023

It's very much available:

$ cat /proc/cpuinfo
[...]
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr *sse* sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cr8_legacy abm sse4a misalignsse 3dnowprefetch bpext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat umip rdpid
[...]

I can't find the piece of code that detects sse, but something doesn't seem right.

@bfren
Copy link

bfren commented Oct 23, 2023

I had this error too - with the same HRESULT. After many (many) hours of searching and testing, I discovered it was nothing to do with memory or CPUs and everything to do with permissions on /tmp.

Using COMPlus_EnableDiagnostics=0 I managed to get the application to startup, and saw a load of these followed by permissions errors:

No XML encryptor configured. Key {daa53741-8295-4c9b-ae9c-e69b003f16fa} may be persisted to storage in unencrypted form.

Turns out my /tmp permissions were 1744 for some reason - changed them to 1777 and boom, it all works beautifully.

I don't know if this will help you but it's lifted my mood that's been rather poor for the last two days trying to track this one down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x86 area-PAL-coreclr os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

6 participants