Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

superpmi: Linux arm collection for libraries.pmi #56156

Closed
kunalspathak opened this issue Jul 22, 2021 · 3 comments
Closed

superpmi: Linux arm collection for libraries.pmi #56156

kunalspathak opened this issue Jul 22, 2021 · 3 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@kunalspathak
Copy link
Member

We have noticed several issues during libraries.pmi for Linux/arm:

  1. We hit asserts like below during collection which need to be fixed.
Assert failure(PID 589 [0x0000024d], Thread: 589 [0x024d]): Assertion failed 'doubleAlignMask == 0x3 || doubleAlignMask == 0xC' in 'Microsoft.FSharp.Core.ValueOption:GetValue(Microsoft.FSharp.Core.FSharpValueOption`1[Double]):double' during 'Pre-import' (IL size 51)    File: /__w/1/s/src/coreclr/jit/lclvars.cpp Line: 1152    Image: /root/helix/work/correlation/superpmi/corerun
  1. Such errors are not reported back and we silently filter them out from the collection. We need to make sure that we exit with errorcode so the collection step would know that there is some failure.

  2. There are several failures of below kind during collection and we need to address them soon:

MISSING: Method context 24385 failed to replay: SuperPMI assertion failed (missing key "key" in map IsValueClass): key 00000000F3907478
MISSING: Method context 26493 failed to replay: SuperPMI assertion failed (missing key "key" in map AsCorInfoType): key 00000000E89EA6C4
MISSING: Method context 26597 failed to replay: SuperPMI assertion failed (missing key "key" in map GetMethodSig): key ftn-00000000EAB92A20 prt-0000000000000000
MISSING: Method context 27037 failed to replay: SuperPMI assertion failed (missing map GetStaticFieldCurrentClass): key 00000000EA196268
MISSING: Method context 29893 failed to replay: SuperPMI assertion failed (missing key "key" in map GetStaticFieldCurrentClass): key 00000000EA757AF8
  1. Lastly, there are some failures that we get during collection and we try to filter them out, but we still hit some failures during validation phase. Need to investigate why we see such non-determinism.
Running SuperPMI replay of /root/helix/work/workitem/uploads/libraries.pmi.1.Linux.arm.checked.mch
Invoking: /root/helix/work/correlation/superpmi/superpmi -v ewmi -r /tmp/tmpjjdpxvt3/repro -p -f /tmp/tmpjjdpxvt3/libraries.pmi.1.Linux.arm.checked.mch_fail.mcl /root/helix/work/correlation/superpmi/libclrjit.so /root/helix/work/workitem/uploads/libraries.pmi.1.Linux.arm.checked.mch
MISSING: Method context 35873 failed to replay: SuperPMI assertion failed (missing key "key" in map AsCorInfoType): key 00000000E7281FD8
MISSING: Method context 58349 failed to replay: SuperPMI assertion failed (missing key "key" in map IsValueClass): key 00000000F3FF4240
MISSING: Method context 30306 failed to replay: SuperPMI assertion failed (missing key "key" in map ResolveToken): token 4000098
MISSING: Method context 34570 failed to replay: SuperPMI assertion failed (missing key "key" in map AsCorInfoType): key 00000000F136E580
MISSING: Method context 58194 failed to replay: SuperPMI assertion failed (missing key "key" in map IsValueClass): key 00000000F3FF4240
MISSING: Method context 35251 failed to replay: SuperPMI assertion failed (missing key "key" in map AsCorInfoType): key 00000000E85DA6C0
MISSING: Method context 53235 failed to replay: SuperPMI assertion failed (missing key "key" in map ResolveToken): token 60004ad
Compilation failures
Method numbers with compilation failures:
30306
34570
35251
35873
53235
58194
58349
Replay summary:
  Replay failures in 1 MCH files:
    /root/helix/work/workitem/uploads/libraries.pmi.1.Linux.arm.checked.mch
Error, unclean replay.
Finish time: 00:02:48
Elapsed time: 0:14:29.279955
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jul 22, 2021
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib , @BruceForstall

@BruceForstall BruceForstall self-assigned this Jul 22, 2021
@BruceForstall BruceForstall added this to the 6.0.0 milestone Jul 22, 2021
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Jul 22, 2021
BruceForstall added a commit to BruceForstall/runtime that referenced this issue Jul 29, 2021
impImportStaticReadOnlyField reads the data at the address returned by
getFieldAddress. SuperPMI saves and replays this data, but it doesn't
store it at a naturally aligned address. For int64/double (at least),
this is a problem on Linux arm32, which raises a SIGBUS exception on
such unaligned access.

This works around the problem by copying the potentially unaligned data
to a known aligned spot before reading it. This is only done under DEBUG
and when we know we are replaying under SuperPMI.

A proper fix would be to teach SuperPMI about alignment, but that would be
a large and risky change, compared to this small and isolated workaround.

The fixes the non-determinism of dotnet#56156.
@BruceForstall
Copy link
Member

  1. We hit asserts like below during collection which need to be fixed.

This is fixed by #56375

  1. Such errors are not reported back and we silently filter them out from the collection. We need to make sure that we exit with errorcode so the collection step would know that there is some failure.

This was actually "by design". The question is what kind of errors would it be worthwhile reporting? We still expect to see "missing data" errors (there are other GitHub issues covering that). So maybe we could report back non-missing data errors?

  1. There are several failures of below kind during collection and we need to address them soon:

#47546 and #47540 cover this.

  1. Lastly, there are some failures that we get during collection and we try to filter them out, but we still hit some failures during validation phase. Need to investigate why we see such non-determinism.

This is fixed by #56517

BruceForstall added a commit that referenced this issue Jul 30, 2021
* Work around a Linux arm32 unaligned data issue in SuperPMI

impImportStaticReadOnlyField reads the data at the address returned by
getFieldAddress. SuperPMI saves and replays this data, but it doesn't
store it at a naturally aligned address. For int64/double (at least),
this is a problem on Linux arm32, which raises a SIGBUS exception on
such unaligned access.

This works around the problem by copying the potentially unaligned data
to a known aligned spot before reading it. This is only done under DEBUG
and when we know we are replaying under SuperPMI.

A proper fix would be to teach SuperPMI about alignment, but that would be
a large and risky change, compared to this small and isolated workaround.

The fixes the non-determinism of #56156.

* Only add alignment adjustment when required
@BruceForstall
Copy link
Member

This is basically fixed at this point, with other GitHub issues covering remaining issues.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants