-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm64] Implement MultiplyHigh #47362
Changes from all commits
a3357da
8e84e0a
ea5504b
b3b4dfa
1567a18
59945ef
3391d12
16cd135
d8455c2
9e94fba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -151,6 +151,11 @@ public static unsafe ulong BigMul(ulong a, ulong b, out ulong low) | |
low = tmp; | ||
return high; | ||
} | ||
else if (ArmBase.Arm64.IsSupported) | ||
{ | ||
low = a * b; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably a code pattern that the JIT should handle and generate This has been previously blocked due to codegen issues, but I think the work Carol did around multi-reg returns unblocked that, in which case these should forward to an intrinsic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tannergooding I am not sure I understand you comment about handling and generating There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My mistake, I misread and thought It looks like it's just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It actually looks like you might be able to achieve it using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe But there is a definitely room for improving 32-bit integer multiplication (see #47490) that would result in more efficient code for |
||
return ArmBase.Arm64.MultiplyHigh(a, b); | ||
} | ||
|
||
return SoftwareFallback(a, b, out low); | ||
|
||
|
@@ -185,6 +190,12 @@ static ulong SoftwareFallback(ulong a, ulong b, out ulong low) | |
/// <returns>The high 64-bit of the product of the specied numbers.</returns> | ||
public static long BigMul(long a, long b, out long low) | ||
{ | ||
if (ArmBase.Arm64.IsSupported) | ||
{ | ||
low = a * b; | ||
return ArmBase.Arm64.MultiplyHigh(a, b); | ||
} | ||
|
||
ulong high = BigMul((ulong)a, (ulong)b, out ulong ulow); | ||
low = (long)ulow; | ||
return (long)high - ((a >> 63) & b) - ((b >> 63) & a); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mono will need support for these. Alternatively, you'll need to make Mono return false for IsSupported. Otherwise, any use of
BigMul
on arm64 with LLVM JIT/AOT will make Mono crash with a stack overflow.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @echesakovMSFT better just put
0
toruntime/src/mono/mono/mini/simd-intrinsics-netcore.c
Line 856 in 59ba160
For reference, here is the IR we need to emit: https://godbolt.org/z/64rvjP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if mono's support could be table driven like it is in RyuJIT.
Then it would (ideally) be relatively trivial most of the time to just say
Arm64.MultiplyHigh
maps tollvm.intrinsic....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding While I agree in general, I don't think there is an intrinsic for this particular thing, as you can see from my godbolt link you will have to emit at least 5 different statements for this operation so a table won't help here 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. ARM64 actually doesn't define a C++ intrinsic for this: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=UMULH, so LLVM won't have a corresponding
llvm.intrinsic
entry, 👍There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave it a try and implemented
MultiplyHigh
in mono.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorBo @imhameed Are these
runtime/src/tests/issues.targets
Line 2629 in 1765d03
disabled on purpose with mono? If so, what would be an alternative way to verify that my changes are correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if they work now. Presumably they failed to pass although nothing was never recorded. They should (and will) be reenabled. That work is being tracked here: #43842
You could try reenabling this specific test and seeing what happens. You could also build Mono as an arm64 AOT cross compiler if you don't have immediate access to any arm64 hardware. Here are some instructions for doing this: https://gist.github.com/imhameed/e4b246dbf0d0b247155fc9e3326194b6
This is not ideal, but this will get better soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your branch on an arm64 machine.
Given:
it yields:
and given:
it yields:
And the intermediate mini IR and LLVM IR all looks reasonable. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imhameed Thanks a lot for validating the change!