-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Net.* and System.Security.* crash on NativeAOT arm64 #74710
Comments
Tagging subscribers to this area: @dotnet/ncl, @vcsjones Issue DetailsFrequency as of 8/27 - last 30 days (with Core dump):
Not quite blocking CI yet, but close ...
|
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
Stacktrace:
|
A function pointer in libcrypto data structures is bad. Use-after-free bug? 84350000 is the corrupted function pointer. 8490... are the good function pointers around it.
|
Triage: We should treat it as high pri for 7.0 until we have further details about root cause. |
cc @LakshanF as it seems to be specific to arm64 NativeAOT -- I think you mentioned there were problems with arm64 NativeAOT recently ... |
I have taken a look at the two most recent core dumps. It is still same pattern - function pointer overwritten in libcrypto My guess is that this crash is another manifestation of the problem that is behind other mysterious crashes in networking tests: #74795 and #72830 . There seems to be a stray memory write in networking that corrupts random memory. |
I was looking at the System.Security.Cryptography.Tests crash in #75306 that also looks similar (we end up in some location that is not valid code) - https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-75306-merge-171bd00681b64e6184/System.Security.Cryptography.Tests/1/console.108589f8.log?helixlogtype=result. That one is not a networking test, but libcrypto is the common theme. Maybe a crypto bug instead of networking bug?
I'm not aware of any active ARM64 issues with NativeAOT. The one we had was fixed two weeks ago. |
The stack for the System.Security.Cryptography test failure is (notice the offset of the last frame is very far away from anything known):
|
I added System.Security.Cryptography into the table in top post. There have been 2 PRs yesterday to hit it. |
The interesting detail in this dump is that there is @rzikm @wfurt @karelz Is it expected that There was change in the msquic unloading done yesterday (#75163) that can explain why we have started seeing this crash in a new test (due to changes in timing of the corruption, etc.). |
I can think of no reason why X509_get_pubkey_parameters would call (shadow_)DES_check_key. https://github.com/openssl/openssl/blob/6e6aad333f26694ff39aba1e59b358e3f25a9a1d/crypto/x509/x509_vfy.c#L1948-L1981 DES would have nothing to do with a chain build. SO some sort of memory corruption is happening somewhere. (I don't even see anything there that would be calling an object-stored fnptr, so something has corrupted executable pages) |
Unless check_key is more than 0x73ee640 bytes long, we're not actually in check_key method body. It was just the symbol debugger picked up. |
Right, the above stacktrace is with export symbols only so it is not accurate. For reference, the stacktrace with resolved private symbols is:
The corrupted function pointer is The function pointers next to it look fine. It is just the low 4 bytes of
|
We do hook that callback into managed: Lines 644 to 677 in e55c908
The delegate is kept alive for the call via GC.KeepAlive. If NativeAOT/arm64 doesn't respect that, maybe we've jumped to where a delegate used to be pointing while we still needed it? |
Yes, it looks like a native aot problem then. |
* Flush instruction cache after thunk pool allocation Fixes #74710 * More precise ifdef
Reopening to track the 7.0 backport ... |
Backport has been merged. |
Frequency as of:
8/28System.Net.Http.Functional.TestsRolling run 1972750net7.0-windows-Release-x64-NativeAOT_Release-Windows.81.Amd64.OpenCore Dump - Unhandled Exception: System.NullReferenceException8/25System.Net.Http.Functional.TestsMerged PR 1966082net7.0-windows-Release-x64-NativeAOT_Release-Windows.81.Amd64.OpenCore Dump - [FAIL] System.Net.Http.Functional.Tests.SyncHttpHandlerTest_AutoRedirect.AllowAutoRedirect_True_ValidateNewMethodUsedOnRedirectionNot quite blocking CI yet, but close ...
The text was updated successfully, but these errors were encountered: