[Arm64] Implement MultiplyHigh #47362

echesakov · 2021-01-23T02:05:06Z

In addition to implementing the intrinsics I have updated System.Math:BigMul(long,long,byref):long implementation in System.Private.CoreLib. The following is the codegen of the methods:

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0        
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1        
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2        
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9BC17C00          umulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0        
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1        
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2        
;* V03 loc0         [V03    ] (  0,  0   )    long  ->  zero-ref   
;* V04 loc1         [V04    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9B417C00          smulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

dotnet-issue-labeler · 2021-01-23T02:05:13Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2021-01-23T02:05:14Z

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #43106

Author:	echesakovMSFT
Assignees:	-
Labels:	`area-System.Runtime.Intrinsics`, `new-api-needs-documentation`
Milestone:	-

echesakov · 2021-01-26T02:43:45Z

@dotnet/jit-contrib PTAL
cc @tannergooding @TamarChristinaArm

BruceForstall

LGTM

TamarChristinaArm · 2021-01-26T11:06:43Z

@echesakovMSFT out of curiosity what does the JIT generate for

public static long BigMul(int a, int b)
{
    return ((long)a) * b;
}

that indicates the multiplication has to be done as long, just curious if it generates <U|S>MULL here or two extensions and then a normal MUL.

tannergooding · 2021-01-26T16:24:54Z

src/libraries/System.Private.CoreLib/src/System/Math.cs

@@ -151,6 +151,11 @@ public static unsafe ulong BigMul(ulong a, ulong b, out ulong low)
                low = tmp;
                return high;
            }
+            else if (ArmBase.Arm64.IsSupported)
+            {
+                low = a * b;


This is probably a code pattern that the JIT should handle and generate UMULL and SMULL

This has been previously blocked due to codegen issues, but I think the work Carol did around multi-reg returns unblocked that, in which case these should forward to an intrinsic (long, long) BigMul(long, long)

@tannergooding I am not sure I understand you comment about handling and generating umull/smull in context of BigMul(long, long)

My mistake, I misread and thought UMULL had a variant that multiplies two 64-bit and returns the 128-bit result in 2x register like IMUL and MUL support on x86/x64.

It looks like it's just 32x32=64 and 64x64=upper 64 (the latter via UMULH). I don't see a variant that also returns the lower 64 with the same computation.

It actually looks like you might be able to achieve it using PMULL and/or PMULL2, but it isn't clear if that's "always a win"

I don't believe PMULL{2} can be used at this context - results of the instructions (polynomial multiplication) does't correspond to the result of integer multiplications.

But there is a definitely room for improving 32-bit integer multiplication (see #47490) that would result in more efficient code for BigMul(int, int)

imhameed · 2021-01-26T16:32:45Z

src/libraries/System.Private.CoreLib/src/System/Math.cs

@@ -151,6 +151,11 @@ public static unsafe ulong BigMul(ulong a, ulong b, out ulong low)
                low = tmp;
                return high;
            }
+            else if (ArmBase.Arm64.IsSupported)


Mono will need support for these. Alternatively, you'll need to make Mono return false for IsSupported. Otherwise, any use of BigMul on arm64 with LLVM JIT/AOT will make Mono crash with a stack overflow.

I think @echesakovMSFT better just put 0 to

runtime/src/mono/mono/mini/simd-intrinsics-netcore.c

Line 856 in 59ba160

EMIT_NEW_ICONST (cfg, ins, supported ? 1 : 0);

and some of us will implement these for Mono later and enable it back

For reference, here is the IR we need to emit: https://godbolt.org/z/64rvjP

It would be great if mono's support could be table driven like it is in RyuJIT.

Then it would (ideally) be relatively trivial most of the time to just say Arm64.MultiplyHigh maps to llvm.intrinsic....

@tannergooding While I agree in general, I don't think there is an intrinsic for this particular thing, as you can see from my godbolt link you will have to emit at least 5 different statements for this operation so a table won't help here 🙂

Ah, I see. ARM64 actually doesn't define a C++ intrinsic for this: https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=UMULH, so LLVM won't have a corresponding llvm.intrinsic entry, 👍

Gave it a try and implemented MultiplyHigh in mono.

@EgorBo @imhameed Are these

runtime/src/tests/issues.targets

Line 2629 in 1765d03

<ExcludeList Include = "$(XunitTestBinBase)/JIT/HardwareIntrinsics/Arm/ArmBase.Arm64/ArmBase.Arm64_ro/**">

disabled on purpose with mono? If so, what would be an alternative way to verify that my changes are correct?

I wonder if they work now. Presumably they failed to pass although nothing was never recorded. They should (and will) be reenabled. That work is being tracked here: #43842

You could try reenabling this specific test and seeing what happens. You could also build Mono as an arm64 AOT cross compiler if you don't have immediate access to any arm64 hardware. Here are some instructions for doing this: https://gist.github.com/imhameed/e4b246dbf0d0b247155fc9e3326194b6

This is not ideal, but this will get better soon.

I tried your branch on an arm64 machine.

Given:

[MethodImpl(MethodImplOptions.NoInlining)] public static ulong test_mulhi(ulong x, ulong y) { return ArmBase.Arm64.MultiplyHigh(x, y); }

it yields:

0:» 9bc17c00 » umulh» x0, x0, x1 4:» d65f03c0 » ret

and given:

[MethodImpl(MethodImplOptions.NoInlining)] public static long test_mulhi(long x, long y) { return ArmBase.Arm64.MultiplyHigh(x, y); }

it yields:

0:» 9b417c00 » smulh» x0, x0, x1 4:» d65f03c0 » ret

And the intermediate mini IR and LLVM IR all looks reasonable. Thanks!

I tried your branch on an arm64 machine.

@imhameed Thanks a lot for validating the change!

echesakov · 2021-01-26T20:32:40Z

@TamarChristinaArm For BigMul(int,int) the JIT generates two sign extensions and mul.

; Assembly listing for method System.Math:BigMul(int,int):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   x0        
;  V01 arg1         [V01,T01] (  3,  3   )     int  ->   x1        
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M36318_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M36318_IG02:              ;; offset=0008H
        93407C00          sxtw    x0, w0
        93407C21          sxtw    x1, w1
        9B017C00          mul     x0, x0, x1
						;; bbWeight=1    PerfScore 3.00
G_M36318_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

Opened #47490 to track this

src/mono/mono/mini/mini-llvm.c

dotnet-issue-labeler bot added area-System.Runtime.Intrinsics new-api-needs-documentation labels Jan 23, 2021

echesakov added arch-arm64 and removed new-api-needs-documentation labels Jan 23, 2021

echesakov self-assigned this Jan 23, 2021

echesakov force-pushed the Arm64-MultiplyHigh branch from 08168be to 5c1a848 Compare January 25, 2021 23:59

echesakov added 8 commits January 25, 2021 18:02

Add MultiplyHigh in src/coreclr/jit/hwintrinsiclistarm64.h

a3357da

Add MultiplyHigh in ArmBase.cs ArmBase.PlatformNotSupported.cs

8e84e0a

Update System.Runtime.Intrinsics.cs

ea5504b

Use ArmBase.Arm64.MultiplyHigh in BigMul in Math.cs

b3b4dfa

Implement MultiplyHigh in src/coreclr/jit/hwintrinsic.cpp

1567a18

Add MultiplyHigh in GenerateTests.csx

59945ef

Update src/tests/JIT/HardwareIntrinsics/Arm/ArmBase.Arm64/*

3391d12

Implement MultiplyHigh in Helpers.cs Helpers.tt

16cd135

echesakov force-pushed the Arm64-MultiplyHigh branch from 5c1a848 to 16cd135 Compare January 26, 2021 02:03

echesakov marked this pull request as ready for review January 26, 2021 02:43

BruceForstall approved these changes Jan 26, 2021

View reviewed changes

runfoapp bot mentioned this pull request Jan 26, 2021

Inability to unzip assets during build on Unix x64 #32805

Closed

tannergooding reviewed Jan 26, 2021

View reviewed changes

tannergooding approved these changes Jan 26, 2021

View reviewed changes

imhameed suggested changes Jan 26, 2021

View reviewed changes

echesakov mentioned this pull request Jan 26, 2021

[Arm64] Use smull/umull for computing 64-bit result of multiplication of two 32-bit ints/uint #47490

Closed

Support MultiplyHigh in mono

d8455c2

echesakov requested review from CoffeeFlux and lambdageek as code owners January 27, 2021 02:15

echesakov requested review from SamMonoRT and vargaz as code owners January 27, 2021 02:15

monojenkins mentioned this pull request Jan 27, 2021

[Arm64] Implement MultiplyHigh mono/mono#20792

Merged

imhameed reviewed Jan 27, 2021

View reviewed changes

src/mono/mono/mini/mini-llvm.c Outdated Show resolved Hide resolved

imhameed approved these changes Jan 27, 2021

View reviewed changes

Add spaces before parens in src/mono/mono/mini/mini-llvm.c

9e94fba

echesakov merged commit 39d6396 into dotnet:master Jan 28, 2021

echesakov deleted the Arm64-MultiplyHigh branch January 28, 2021 00:23

echesakov mentioned this pull request Jan 28, 2021

[Arm64] Planned JIT work in .NET 6 #43629

Closed

29 tasks

JulieLeeMSFT mentioned this pull request Feb 24, 2021

What's new in .NET 6 Preview 2 dotnet/core#5889

Closed

ghost locked as resolved and limited conversation to collaborators Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arm64] Implement MultiplyHigh #47362

[Arm64] Implement MultiplyHigh #47362

echesakov commented Jan 23, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Jan 23, 2021

ghost commented Jan 23, 2021

echesakov commented Jan 26, 2021

BruceForstall left a comment

TamarChristinaArm commented Jan 26, 2021 •

edited

Loading

tannergooding Jan 26, 2021

echesakov Jan 26, 2021

tannergooding Jan 26, 2021

tannergooding Jan 26, 2021

echesakov Jan 27, 2021

imhameed Jan 26, 2021 •

edited

Loading

EgorBo Jan 26, 2021

tannergooding Jan 26, 2021

EgorBo Jan 26, 2021

tannergooding Jan 26, 2021

echesakov Jan 27, 2021

echesakov Jan 27, 2021

imhameed Jan 27, 2021 •

edited

Loading

imhameed Jan 27, 2021

echesakov Jan 27, 2021

echesakov commented Jan 26, 2021 •

edited

Loading

[Arm64] Implement MultiplyHigh #47362

[Arm64] Implement MultiplyHigh #47362

Conversation

echesakov commented Jan 23, 2021 • edited Loading

dotnet-issue-labeler bot commented Jan 23, 2021

ghost commented Jan 23, 2021

echesakov commented Jan 26, 2021

BruceForstall left a comment

Choose a reason for hiding this comment

TamarChristinaArm commented Jan 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imhameed Jan 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imhameed Jan 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echesakov commented Jan 26, 2021 • edited Loading

echesakov commented Jan 23, 2021 •

edited

Loading

TamarChristinaArm commented Jan 26, 2021 •

edited

Loading

imhameed Jan 26, 2021 •

edited

Loading

imhameed Jan 27, 2021 •

edited

Loading

echesakov commented Jan 26, 2021 •

edited

Loading