Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update llvm-libunwind from v9.0.0 to v14.0.6 #72442

Merged
merged 8 commits into from
Jul 20, 2022

Conversation

am11
Copy link
Member

@am11 am11 commented Jul 19, 2022

Closes #72344

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 19, 2022
@am11 am11 requested review from VSadov and janvorli July 19, 2022 10:53
@@ -117,23 +83,12 @@ namespace libunwind {
// __eh_frame_hdr_start = SIZEOF(.eh_frame_hdr) > 0 ? ADDR(.eh_frame_hdr) : 0;
// __eh_frame_hdr_end = SIZEOF(.eh_frame_hdr) > 0 ? . : 0;

#ifndef _LIBUNWIND_USE_ONLY_DWARF_INDEX
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_LIBUNWIND_USE_ONLY_DWARF_INDEX, _LIBUNWIND_BAREMETAL_DWARF_INDEX_SEC_START and _LIBUNWIND_BAREMETAL_DWARF_INDEX_SEC_END were added in dotnet/corert#8271, but they are always using the default value, and not defined anywhere else. I have removed them as part of resolving merge conflicts (and to avoid future conflicts).

@am11
Copy link
Member Author

am11 commented Jul 19, 2022

One test is failing on Linux x64:

Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test

Thread 2 "DynamicGenerics" received signal SIG34, Real-time event 34.
[Switching to Thread 0x7f0b60b3a700 (LWP 186231)]
futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5628c407bd4c) at ../sysdeps/nptl/futex-internal.h:183
183     ../sysdeps/nptl/futex-internal.h: No such file or directory.

(gdb) thread apply all bt

Thread 3 (Thread 0x7f0b5a23f700 (LWP 186232)):
#0  __libc_read (nbytes=1, buf=0x7f0b5a23ed7f, fd=5) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=5, buf=0x7f0b5a23ed7f, nbytes=1) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00005628c2c0dddf in ?? ()
#3  0x00007f0b80f03609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#4  0x00007f0b80e28163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f0b60b3a700 (LWP 186231)):
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5628c407bd4c) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5628c407bd50, cond=0x5628c407bd20) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5628c407bd20, mutex=0x5628c407bd50) at pthread_cond_wait.c:638
#3  0x00005628c2be9813 in GCEvent::Impl::Wait(unsigned int, bool) ()
#4  0x00005628c2be92a3 in GCEvent::Wait(unsigned int, bool) ()
#5  0x00005628c2b6ef87 in WKS::GCHeap::WaitUntilGCComplete(bool) ()
#6  0x00005628c2b5bcf7 in RedhawkGCInterface::WaitForGCCompletion() ()
#7  0x00005628c2b679c6 in Thread::WaitForGC(PInvokeTransitionFrame*) ()
#8  0x00005628c2b69710 in RhpWaitForGC2 ()
#9  0x00005628c2dcccfa in S_P_CoreLib_System_Runtime_InternalCalls__RhpSignalFinalizationComplete ()
#10 0x00005628c2dcb094 in S_P_CoreLib_System_Runtime___Finalizer__ProcessFinalizers ()
#11 0x00005628c2b58fce in FinalizerStart(void*) ()
#12 0x00007f0b80f03609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f0b80e28163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f0b80ce9880 (LWP 186227)):
#0  0x00005628c2ba64b7 in WKS::gc_heap::mark_object_simple1(unsigned char*, unsigned char*) ()
#1  0x00005628c2ba80f2 in WKS::gc_heap::mark_object_simple(unsigned char**) ()
#2  0x00005628c2bacc2c in WKS::GCHeap::Promote(Object**, ScanContext*, unsigned int) ()
#3  0x00005628c2be249f in PromoteObject(Object**, unsigned long*, unsigned long, unsigned long) ()
#4  0x00005628c2be0338 in ScanConsecutiveHandlesWithoutUserData(Object**, Object**, ScanCallbackInfo*, unsigned long*) ()
#5  0x00005628c2be054f in BlockScanBlocksWithoutUserData(TableSegment*, unsigned int, unsigned int, ScanCallbackInfo*) ()
#6  0x00005628c2be1764 in SegmentScanByTypeChain(TableSegment*, unsigned int, void (*)(TableSegment*, unsigned int, unsigned int, ScanCallbackInfo*), ScanCallbackInfo*) ()
#7  0x00005628c2be195f in TableScanHandles(HandleTable*, unsigned int const*, unsigned int, TableSegment* (*)(HandleTable*, TableSegment*, CrstHolderWithState*), void (*)(TableSegment*, unsigned int, unsigned int, ScanCallbackInfo*), ScanCallbackInfo*, CrstHolderWithState*) ()
#8  0x00005628c2bdbe47 in HndScanHandlesForGC(HandleTable*, void (*)(Object**, unsigned long*, unsigned long, unsigned long), unsigned long, unsigned long, unsigned int const*, unsigned int, unsigned int, unsigned int, unsigned int) ()
#9  0x00005628c2be3523 in Ref_TraceNormalRoots(unsigned int, unsigned int, ScanContext*, void (*)(Object**, ScanContext*, unsigned int)) ()
#10 0x00005628c2bdaea6 in GCScan::GcScanHandles(void (*)(Object**, ScanContext*, unsigned int), int, int, ScanContext*) ()
#11 0x00005628c2b98452 in WKS::gc_heap::mark_phase(int, int) ()
#12 0x00005628c2b93e06 in WKS::gc_heap::gc1() ()
#13 0x00005628c2ba3ebd in WKS::gc_heap::garbage_collect(int) ()
#14 0x00005628c2b88a31 in WKS::GCHeap::GarbageCollectGeneration(unsigned int, gc_reason) ()
#15 0x00005628c2bd1e55 in WKS::GCHeap::GarbageCollectTry(int, int, int) ()
#16 0x00005628c2bd1cb2 in WKS::GCHeap::GarbageCollect(int, bool, int) ()
#17 0x00005628c2b594d4 in RhpCollect ()
#18 0x00005628c2dcca39 in S_P_CoreLib_System_Runtime_InternalCalls__RhpCollect ()
#19 0x00005628c2dcc9ec in S_P_CoreLib_System_Runtime_InternalCalls__RhCollect ()
#20 0x00005628c2c5bf94 in S_P_CoreLib_System_GC__Collect_0 ()
#21 0x00005628c2c375e0 in DynamicGenerics_ThreadLocalStatics_TLSTesting__ThreadLocalStatics_Test ()
#22 0x00005628c2c4b843 in DynamicGenerics_EntryPointMain___c___Main_b__0_62 ()
#23 0x00005628c2c4713d in DynamicGenerics_CoreFXTestLibrary_Internal_Runner__RunTestMethod ()
#24 0x00005628c2c46c3f in DynamicGenerics_CoreFXTestLibrary_Internal_Runner__RunTest ()
#25 0x00005628c2c469b4 in DynamicGenerics_CoreFXTestLibrary_Internal_Runner__RunTests ()
#26 0x00005628c2c260fd in DynamicGenerics_EntryPointMain__Main ()
#27 0x00005628c305c107 in DynamicGenerics__Module___MainMethodWrapper ()
#28 0x00005628c305c1a3 in __managed__Main ()
#29 0x00005628c2b56fdf in main ()

@VSadov
Copy link
Member

VSadov commented Jul 19, 2022

SIG34 is how GC suspension interrupts threads asynchronously on Linux.
These stacks look ok. The real problem is likely something that happens later.

When debugging in GDB you may want to just pass through SIG34: handle SIG34 nostop noprint

@am11
Copy link
Member Author

am11 commented Jul 19, 2022

Bypassing SIG34, we get:

Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
[New Thread 0x7f91b3fff700 (LWP 186654)]
[New Thread 0x7f91b869f700 (LWP 186655)]
[New Thread 0x7f91b37fe700 (LWP 186656)]
[New Thread 0x7f91b2ffd700 (LWP 186657)]
[New Thread 0x7f91b27fc700 (LWP 186658)]
[New Thread 0x7f91b1ffb700 (LWP 186659)]
[New Thread 0x7f91b17fa700 (LWP 186660)]
[New Thread 0x7f91b0ff9700 (LWP 186661)]
[New Thread 0x7f918ffff700 (LWP 186662)]
[New Thread 0x7f918f7fe700 (LWP 186663)]
[New Thread 0x7f918effd700 (LWP 186664)]
[New Thread 0x7f918e7fc700 (LWP 186665)]
[New Thread 0x7f918dffb700 (LWP 186666)]
[New Thread 0x7f918d7fa700 (LWP 186667)]
[New Thread 0x7f918cff9700 (LWP 186668)]
[New Thread 0x7f916bfff700 (LWP 186669)]
[New Thread 0x7f916b7fe700 (LWP 186670)]

Thread 8 "DynamicGenerics" received signal SIGABRT, Aborted.
[Switching to Thread 0x7f91b27fc700 (LWP 186658)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f91db893859 in __GI_abort () at abort.c:79
#2  0x000055f8e39dd9bc in Assert(char const*, char const*, unsigned int, char const*) ()
#3  0x000055f8e3943231 in Thread::HijackReturnAddressWorker(StackFrameIterator*, void**) ()
#4  0x000055f8e3942dbd in Thread::HijackReturnAddress(UNIX_CONTEXT*, void**) ()
#5  0x000055f8e3942c4c in Thread::HijackCallback(UNIX_CONTEXT*, void*) ()
#6  0x000055f8e39bfe48 in ?? ()
#7  <signal handler called>
#8  0x000055f8e3b67631 in S_P_CoreLib_System_Diagnostics_Debug__Assert ()
#9  0x000055f8e3d924e4 in S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierWKeyed_2_Container<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeConstructedGenericTypeInfo_UnificationKey__System___Canon>__VerifyUnifierConsistency ()
#10 0x000055f8e3d913fe in S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierWKeyed_2<S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeConstructedGenericTypeInfo_UnificationKey__System___Canon>__GetOrAdd ()
#11 0x000055f8e3bcb0d6 in S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeConstructedGenericTypeInfo__GetRuntimeConstructedGenericTypeInfo_0 ()
#12 0x000055f8e3bcb039 in S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeConstructedGenericTypeInfo__GetRuntimeConstructedGenericTypeInfo ()
#13 0x000055f8e3be60ff in S_P_CoreLib_System_Reflection_Runtime_General_TypeUnifier__GetConstructedGenericTypeWithTypeHandle ()
#14 0x000055f8e3bc60a8 in S_P_CoreLib_System_Reflection_Runtime_TypeInfos_RuntimeTypeInfo__MakeGenericType ()
#15 0x000055f8e3a0eb32 in DynamicGenerics_ThreadLocalStatics_TLSTesting__MakeType1 ()
#16 0x000055f8e3a26e7f in DynamicGenerics_ThreadLocalStatics_TLSTesting___c__DisplayClass3_0___MultiThreaded_Test_b__0 ()
#17 0x000055f8e3b1fe63 in S_P_CoreLib_System_Threading_Tasks_Task__InnerInvoke ()
#18 0x000055f8e3c35c37 in S_P_CoreLib_System_Threading_Tasks_Task___c____cctor_b__273_0 ()
#19 0x000055f8e3b13641 in S_P_CoreLib_System_Threading_ExecutionContext__RunFromThreadPoolDispatchLoop ()
#20 0x000055f8e3b1fc19 in S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal ()
#21 0x000055f8e3b1f88d in S_P_CoreLib_System_Threading_Tasks_Task__ExecuteEntryUnsafe ()
#22 0x000055f8e3b1f7ff in S_P_CoreLib_System_Threading_Tasks_Task__ExecuteFromThreadPool ()
#23 0x000055f8e3b19211 in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__DispatchWorkItem ()
#24 0x000055f8e3b18f5a in S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch ()
#25 0x000055f8e3c30fb2 in S_P_CoreLib_System_Threading_PortableThreadPool_WorkerThread__WorkerThreadStart ()
#26 0x000055f8e3e352ea in S_P_CoreLib_System_Threading_ThreadStart__InvokeOpenStaticThunk ()
#27 0x000055f8e3c28e1b in S_P_CoreLib_System_Threading_Thread_StartHelper__RunWorker ()
#28 0x000055f8e3c28d92 in S_P_CoreLib_System_Threading_Thread_StartHelper__Run ()
#29 0x000055f8e3b0f2b2 in S_P_CoreLib_System_Threading_Thread__StartThread ()
#30 0x000055f8e3b0f9c1 in S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint ()
#31 0x00007f91dba6b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#32 0x00007f91db990163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@VSadov
Copy link
Member

VSadov commented Jul 19, 2022

it is hard to tell which assert is actually failing - is it the one in S_P_CoreLib_System_Collections_Concurrent_ConcurrentUnifierWKeyed_2_Container ?

@VSadov
Copy link
Member

VSadov commented Jul 19, 2022

Another possibility is that the new libunwind has problems with unwinding this particular stack and that causes an assert in HijackReturnAddressWorker because it gets bogus location of the return address on the stack.
(it would check if the return address location contains an address of a managed method, anything else is unexpected)

The issue like this would not be specific to just one test typically, there would be other failures.

@VSadov
Copy link
Member

VSadov commented Jul 19, 2022

If this is just one scenario that is affected, I think we may want to disable a scenario and I can look at it later.

Just need to make sure it is only this scenario and not a random crash on certain stacks.

@am11 am11 force-pushed the feature/external/llvm-libunwind branch from 18e634b to ef0164f Compare July 20, 2022 00:19
@am11
Copy link
Member Author

am11 commented Jul 20, 2022

Disabling DynamicGenerics.csproj passed all the (other 15) tests.

Do we have more NativeAOT tests in one of the oterloop pipeline? Can we azp run it?

@am11 am11 marked this pull request as ready for review July 20, 2022 02:26
@jkotas
Copy link
Member

jkotas commented Jul 20, 2022

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

Disabling DynamicGenerics.csproj passed all the (other 15) tests.

DynamicGenerics is the most targeted test of the runtime type loader. We cannot have it disabled. We could comment out this line instead but that is pretty worrying too:

new CoreFXTestLibrary.Internal.TestInfo("ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test", () => global::ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test(), null),

The only thing special about that test case is that it stresses the GC a lot.

I've seen it SIGABRT in this run yesterday: #72236

This might be a recent regression.

@am11
Copy link
Member Author

am11 commented Jul 20, 2022

@VSadov another option is that we block this PR and try to find the solution, then push the required changes here rather than disabling the test.

Although it is a non-stripped test binary, it doesn't have the line number / file info. I built everything as debug:

$ ./build.sh -s clr+clr.aot+libs
$ src/tests/build.sh nativeaot 'tree nativeaot'  /p:LibrariesConfiguration=Debug

$ cd artifacts/tests/coreclr/Linux.x64.Debug/nativeaot/SmokeTests/DynamicGenerics/DynamicGenerics/native
$ file DynamicGenerics 
DynamicGenerics: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=2087cb20056da1ec0c47368caae0ab6a1c72e100, for GNU/Linux 3.2.0, with debug_info, not stripped

$ gdb ./DynamicGenerics
...
Reading symbols from ./DynamicGenerics...
Dwarf Error: DW_FORM_strx1 found in non-DWO CU [in module /runtime/artifacts/tests/coreclr/Linux.x64.Debug/nativeaot/SmokeTests/DynamicGenerics/DynamicGenerics/native/DynamicGenerics]
(No debugging symbols found in ./DynamicGenerics)
...

so it is a bit tricky to debug without debug symbols. :)

@VSadov
Copy link
Member

VSadov commented Jul 20, 2022

I am able to reproduce the failure.
I see nativeaot/SmokeTests/DynamicGenerics/DynamicGenerics failing in nearly every run and nativeaot/SmokeTests/UnitTests/UnitTests occasionally

@VSadov
Copy link
Member

VSadov commented Jul 20, 2022

We are unable to unwind in prologue. We should be able to. Could be something in the new libunwind.

An example of the failure is when we try to hijack at a point like the following

   |0x55555586f131 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+1>      sub    $0x10,%rsp                                                                                                                                                                     │
==>│0x55555586f135 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+5>      lea    0x10(%rsp),%rbp                                                                                                                                                                │
   │0x55555586f13a <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+10>     mov    %rdi,-0x8(%rbp)                                                                                                                                                                │
   │0x55555586f13e <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+14>     nop                                                                                                                                                                                   │
   │0x55555586f13f <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+15>     mov    -0x8(%rbp),%rax                                                                                                                                                                │
   │0x55555586f143 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+19>     movzwl 0x2(%rax),%eax                                                                                                                                                                 │
   │0x55555586f147 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+23>     and    $0x100,%eax                                                                                                                                                                    │
   │0x55555586f14c <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+28>     test   %eax,%eax                                                                                                                                                                      │
   │0x55555586f14e <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+30>     setne  %al                                                                                                                                                                            │
   │0x55555586f151 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+33>     movzbl %al,%eax                                                                                                                                                                       │
   │0x55555586f154 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+36>     mov    %eax,-0xc(%rbp)                                                                                                                                                                │
   │0x55555586f157 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+39>     nop                                                                                                                                                                                   │
   │0x55555586f158 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+40>     mov    -0xc(%rbp),%eax                                                                                                                                                                │
   │0x55555586f15b <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+43>     add    $0x10,%rsp                                                                                                                                                                     │
   │0x55555586f15f <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+47>     pop    %rbp                                                                                                                                                                           │
   │0x55555586f160 <S_P_CoreLib_Internal_Runtime_MethodTable__get_HasOptionalFields+48>     retq

I would expect that library tests would fail a lot with this.
Hijacking in prologue is relatively common when GC happens.

@VSadov
Copy link
Member

VSadov commented Jul 20, 2022

It looks like maybe the fix in inline bool validRegister(int regNum) const is missing

re: #71187 (comment)

Actually, no, this fix is present. I was looking at a wrong source. It must be something else.

static_cast<uint64_t>(instructionsEnd));

// see DWARF Spec, section 6.4.2 for details on unwind opcodes
while ((p < instructionsEnd) && (codeOffset < pcoffset)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have codeOffset <= pcoffset

The opcode that updates the position comes before the opcodes that describe effects of the instruction, so the while loop must be end-inclusive, or it will miss effects of the last instruction in the range.

If I make the change, DynamicGenerics is passing.

I think the upstream should have this fix, but maybe not in the version that we are getting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@am11 - I have pushed a fix, but feel free to re-apply it if this is not the right way from the change tracking point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VSadov, thank you! :)

This one is still < in 14.0.6: https://github.com/llvm/llvm-project/blob/llvmorg-14.0.6/libunwind/src/DwarfParser.hpp#L449 as well as in their main branch. I will add it to our tracking list once NativeAOT_Libs_Passing legs' runs are completed.

@VSadov
Copy link
Member

VSadov commented Jul 20, 2022

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@am11
Copy link
Member Author

am11 commented Jul 20, 2022

Windows leg timed out Build windows arm64 Release NativeAOT_Libs_Passing, linux and osx ones passed. I will check in version changes now.
Thanks for the fix @VSadov. 👍

@VSadov
Copy link
Member

VSadov commented Jul 20, 2022

Thanks @am11 !
I really appreciate you working on this.

@VSadov VSadov merged commit 17b3c56 into dotnet:main Jul 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] update libunwind
4 participants