[mono] Implement AdvSimd #49260

imhameed · 2021-03-06T09:24:11Z

This change adds AdvSimd and AdvSimd.Arm64 support to LLVM-enabled Mono.

Most aarch64 LLVM intrinsic functions are overloaded and have names determined
by an invariant base string prepended to a string representation of one or two
type parameters. Intrinsic functions used by an LLVM module must have a
declaration somewhere in memory when JITting or somewhere in the output bitcode
file when AOTing. Currently Mono maintains a hash table that maps internal
intrinsic IDs to LLVM intrinsic declarations. These IDs have been extended: a
simplified type representation is added to the key's upper bits. This
representation is not especially compact, and currently uses 9 bits to label 18
states, but it's easy to look at in a debugger. (A simple base-18 encoding
could encode three parameters in 13 bits.)

These overload-tagged IDs can be passed to
OP_XOP_OVR{_,_SCALAR,_BYSCALAR}X_{X,X_X,X_X_X}. The return type of the
intrinsic that generates these mini ops is used to derive the overload tag to
find the corresponding LLVM intrinsic function declaration.

MonoLLVMModule::intrins_by_id is removed, because LLVM intrinsic lookup keys
are no longer small contiguous integers. It only seemed to serve as a lookup
table for data already contained in a hash table.

The corresponding instructions for some of these .NET-level intrinsics take
immediate parameters. For some of these instructions, the LLVM IR code that
selects these immediate-argument instructions can emit a fallback for
non-constant parameters, either by using an equivalent instruction with a
register operand or by using a longer and less-efficient instruction sequence.
For the rest, a branching code sequence is emitted. Helper functions
(immediate_unroll_begin etc.) are added to make this a little less
repetitious.

Some operations take an immediate operand denoting a lane to select in a vector
before proceeding with another generic vector or scalar operation. These are
decomposed into a sequence of OP_ARM64_SELECT_SCALAR followed by the
non-lane-specific operation. LLVM can still optimize this to the lane-selecting
instruction when possible, and can generate fallback code for non-immediate
lane selection.

The tables describing the intrinsics supported by the runtime are extended to
support intrinsics with different target instructions for signed, unsigned and
floating point parameters. Whenever possible, .NET-level intrinsics that
correspond to a single LLVM intrinsic function are stored as a single entry in
these tables. Unfortunately many intrinsics need to be translated into a
sequence of LLVM IR operations; for these, new mini IR opcodes are added to
select the LLVM IR builder code that should run.

(Insert meaningful description here)

Remove `MonoLLVMModule::intrins_by_id`, which doesn't do anything other than serve as a lookup table for data contained in `intrins_id_to_intrins` Don't emit table-driven intrinsics when the corresponding intrinsic group isn't fully supported.

… ShiftArithmeticSaturateScalar, ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar Fix ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar: decompose it into a promotion of the second argument into a vector followed by an overloaded invocation of @llvm.aarch64.neon.uqshl or @llvm.aarch64.neon.sqshl

…teScalar ShiftLeftLogicalSaturateUnsignedScalar: move scalar-op-from-vector-op code into shared functions

…calar

MultiplyDoublingSaturateHighScalar MultiplyDoublingScalarBySelectedScalarSaturateHigh MultiplyDoublingWideningSaturateScalarBySelectedScalar MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate MultiplyRoundedDoublingByScalarSaturateHigh MultiplyRoundedDoublingBySelectedScalarSaturateHigh MultiplyRoundedDoublingSaturateHighScalar MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh - remove unnecessary special cases MultiplyDoublingWideningSaturateScalar - add support for the special-case scalar LLVM intrinsic for sqdmull

…pe when loading a single element

…num) to a separate header

fanyang-mono

Thanks for this massive change!

naricc · 2021-03-10T15:44:42Z

src/mono/mono/mini/mini-llvm.c

@@ -303,6 +310,142 @@ static void create_aot_info_var (MonoLLVMModule *module);
 static void set_invariant_load_flag (LLVMValueRef v);
 static void set_nonnull_load_flag (LLVMValueRef v);

+enum {
+	INTRIN_scalar = 1 << 0,


Is there any particular reason we are defining some of these with constant bit shifts, some with decimal literals, and some with hex literals?

They're hints to the reader: the enumeration constants given values by constant bit shifts are meant to be used as bit selectors in a bit set, the enumeration constants given values by decimal literals are meant to be used to bound loop ranges, and the enumeration constants given values by hex literals are meant to be used as logical masks.

…calar or scalar-in-vector return value in a Vector64 Remove OP_ARM64_ZERO_UPPER, which is unused

… ops undef can apparently pass through intrinsic functions during optimization, so bias towards slightly worse but correct codegen for now

…SCALAR_X_X

dotnet-issue-labeler bot added the area-Codegen-LLVM-mono label Mar 6, 2021

imhameed force-pushed the mono-arm64-advsimd branch 17 times, most recently from 1bef631 to 7d9469c Compare March 8, 2021 05:32

This was referenced Mar 8, 2021

System.Threading.Tasks.Tests timed out on net5.0-Linux-Debug-arm64-Mono_release #42024

Closed

Mono SIGABRT in System.Drawing.Common affecting some test runs #37838

Closed

imhameed added 10 commits March 8, 2021 12:23

Checkpoint

0848153

(Insert meaningful description here)

Add some shifts

706cf0e

Implement rounding

79514da

Implement ReverseElement{Bits,8,16,32}

6cc2596

Add reciprocal fp/u32 operations

f0e2e41

Add some bitwise operations

cca921b

Add negation

dad5fed

Add every AdvSimd symbol name

6341449

Minor cleanup

8580569

Checkpoint

a9bc75e

Remove `MonoLLVMModule::intrins_by_id`, which doesn't do anything other than serve as a lookup table for data contained in `intrins_id_to_intrins` Don't emit table-driven intrinsics when the corresponding intrinsic group isn't fully supported.

vargaz approved these changes Mar 9, 2021

View reviewed changes

imhameed added 4 commits March 9, 2021 14:34

Fix ShiftLeftLogicalSaturateUnsignedScalar, ShiftLogicalRoundedSatura…

e69386d

…teScalar ShiftLeftLogicalSaturateUnsignedScalar: move scalar-op-from-vector-op code into shared functions

Fix PopCount

5b49456

Fix ReverseElement8, ReverseElement16, ReverseElement32

f31aec1

Fix ExtractNarrowingSaturateScalar, ExtractNarrowingSaturateUnsignedS…

12fd7c9

…calar

imhameed force-pushed the mono-arm64-advsimd branch from 46fccb6 to 12fd7c9 Compare March 9, 2021 22:34

imhameed added 4 commits March 9, 2021 17:05

LoadAndReplicateToVector: coerce the source pointer to the element ty…

5baa928

…pe when loading a single element

Move OP_INSERT_* and OP_XCAST to a shared arm64/amd64 region

0d67d57

Address feedback: move IntrinsicId (and another LLVM-only anonymous e…

b14fc9a

…num) to a separate header

fanyang-mono approved these changes Mar 10, 2021

View reviewed changes

Don't attempt to scalarize a non-scalar sqshlu

059bce5

naricc reviewed Mar 10, 2021

View reviewed changes

naricc approved these changes Mar 10, 2021

View reviewed changes

MultiplyDoublingWideningSaturateScalar etc.: consistently place the s…

bd2e5b2

…calar or scalar-in-vector return value in a Vector64 Remove OP_ARM64_ZERO_UPPER, which is unused

imhameed force-pushed the mono-arm64-advsimd branch from fe39968 to a3f1171 Compare March 10, 2021 21:13

imhameed added 2 commits March 11, 2021 07:55

Explicitly zero out the unused bits in scalar ops built out of vector…

ca728df

… ops undef can apparently pass through intrinsic functions during optimization, so bias towards slightly worse but correct codegen for now

Fix the vector concatenation overloads of Vector128/256

24a89e1

imhameed force-pushed the mono-arm64-advsimd branch from a3f1171 to 24a89e1 Compare March 11, 2021 15:55

Sha1.FixedRotate is a scalar-in-vector op. TODO: refactor to use XOP_…

14602df

…SCALAR_X_X

imhameed merged commit 4e2491d into dotnet:main Mar 12, 2021

imhameed mentioned this pull request Mar 12, 2021

[mono] Tracking: Implement System.Runtime.Intrinsics.Arm.AdvSimd #42266

Closed

runfoapp bot mentioned this pull request Mar 12, 2021

[tests] System.Text.Json.Tests segfault, for Libraries Test Run release coreclr OSX x64 Release #47805

Closed

ghost locked as resolved and limited conversation to collaborators Apr 11, 2021

karelz added this to the 6.0.0 milestone May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Implement AdvSimd #49260

[mono] Implement AdvSimd #49260

imhameed commented Mar 6, 2021 •

edited

Loading

fanyang-mono left a comment

naricc Mar 10, 2021

imhameed Mar 10, 2021

[mono] Implement AdvSimd #49260

[mono] Implement AdvSimd #49260

Conversation

imhameed commented Mar 6, 2021 • edited Loading

fanyang-mono left a comment

Choose a reason for hiding this comment

naricc Mar 10, 2021

Choose a reason for hiding this comment

imhameed Mar 10, 2021

Choose a reason for hiding this comment

imhameed commented Mar 6, 2021 •

edited

Loading