Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Implement AdvSimd #49260

Merged
merged 58 commits into from
Mar 12, 2021
Merged

[mono] Implement AdvSimd #49260

merged 58 commits into from
Mar 12, 2021

Conversation

imhameed
Copy link
Contributor

@imhameed imhameed commented Mar 6, 2021

This change adds AdvSimd and AdvSimd.Arm64 support to LLVM-enabled Mono.

Most aarch64 LLVM intrinsic functions are overloaded and have names determined
by an invariant base string prepended to a string representation of one or two
type parameters. Intrinsic functions used by an LLVM module must have a
declaration somewhere in memory when JITting or somewhere in the output bitcode
file when AOTing. Currently Mono maintains a hash table that maps internal
intrinsic IDs to LLVM intrinsic declarations. These IDs have been extended: a
simplified type representation is added to the key's upper bits. This
representation is not especially compact, and currently uses 9 bits to label 18
states, but it's easy to look at in a debugger. (A simple base-18 encoding
could encode three parameters in 13 bits.)

These overload-tagged IDs can be passed to
OP_XOP_OVR{_,_SCALAR,_BYSCALAR}X_{X,X_X,X_X_X}. The return type of the
intrinsic that generates these mini ops is used to derive the overload tag to
find the corresponding LLVM intrinsic function declaration.

MonoLLVMModule::intrins_by_id is removed, because LLVM intrinsic lookup keys
are no longer small contiguous integers. It only seemed to serve as a lookup
table for data already contained in a hash table.

The corresponding instructions for some of these .NET-level intrinsics take
immediate parameters. For some of these instructions, the LLVM IR code that
selects these immediate-argument instructions can emit a fallback for
non-constant parameters, either by using an equivalent instruction with a
register operand or by using a longer and less-efficient instruction sequence.
For the rest, a branching code sequence is emitted. Helper functions
(immediate_unroll_begin etc.) are added to make this a little less
repetitious.

Some operations take an immediate operand denoting a lane to select in a vector
before proceeding with another generic vector or scalar operation. These are
decomposed into a sequence of OP_ARM64_SELECT_SCALAR followed by the
non-lane-specific operation. LLVM can still optimize this to the lane-selecting
instruction when possible, and can generate fallback code for non-immediate
lane selection.

The tables describing the intrinsics supported by the runtime are extended to
support intrinsics with different target instructions for signed, unsigned and
floating point parameters. Whenever possible, .NET-level intrinsics that
correspond to a single LLVM intrinsic function are stored as a single entry in
these tables. Unfortunately many intrinsics need to be translated into a
sequence of LLVM IR operations; for these, new mini IR opcodes are added to
select the LLVM IR builder code that should run.

(Insert meaningful description here)
Remove `MonoLLVMModule::intrins_by_id`, which doesn't do anything other
than serve as a lookup table for data contained in `intrins_id_to_intrins`

Don't emit table-driven intrinsics when the corresponding intrinsic
group isn't fully supported.
… ShiftArithmeticSaturateScalar, ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar

Fix ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar:
decompose it into a promotion of the second argument into a vector
followed by an overloaded invocation of @llvm.aarch64.neon.uqshl or
@llvm.aarch64.neon.sqshl
MultiplyDoublingSaturateHighScalar
MultiplyDoublingScalarBySelectedScalarSaturateHigh

MultiplyDoublingWideningSaturateScalarBySelectedScalar
MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate
MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate

MultiplyRoundedDoublingByScalarSaturateHigh
MultiplyRoundedDoublingBySelectedScalarSaturateHigh
MultiplyRoundedDoublingSaturateHighScalar
MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh
    - remove unnecessary special cases

MultiplyDoublingWideningSaturateScalar
    - add support for the special-case scalar LLVM intrinsic for sqdmull
Copy link
Member

@fanyang-mono fanyang-mono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this massive change!

@@ -303,6 +310,142 @@ static void create_aot_info_var (MonoLLVMModule *module);
static void set_invariant_load_flag (LLVMValueRef v);
static void set_nonnull_load_flag (LLVMValueRef v);

enum {
INTRIN_scalar = 1 << 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason we are defining some of these with constant bit shifts, some with decimal literals, and some with hex literals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're hints to the reader: the enumeration constants given values by constant bit shifts are meant to be used as bit selectors in a bit set, the enumeration constants given values by decimal literals are meant to be used to bound loop ranges, and the enumeration constants given values by hex literals are meant to be used as logical masks.

…calar or scalar-in-vector return value in a Vector64

Remove OP_ARM64_ZERO_UPPER, which is unused
… ops

undef can apparently pass through intrinsic functions during
optimization, so bias towards slightly worse but correct codegen for now
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants