ARM64-SVE: refactor lsra buildHWIntrinsic #107459

a74nh · 2024-09-06T13:12:57Z

The logic for hwintrisics has become convoluted. Refactor it, for both SVE and AdvSimd.

Add functions to get the operand (if any) for each requirement - delay slot, consecutive registers, address, etc.

Then use a simple for loop to iterate through each operand and build depending on which requirements match for that operand.

Tested by using stress_test.py on the entire HardwareIntrinsics_Arm set.

a74nh · 2024-09-06T13:14:35Z

@kunalspathak - Still WIP. For now ignore anything outside of lsraarm64.cpp and lsra.hpp. All the other changes are from other PRs and will be removed was those have been merged.

…andUses()

a74nh · 2024-09-09T15:41:28Z

This PR is ready now.

Requires #107084, #107180 and a workaround for #107537 in order for all the hwintrinsic tests to pass.

Apologies, this is a large change to review, and the github diff is confused about functions I haven't touched. Probably best starting a review from the new version of BuildHWIntrinsic()

I recommend this is not merged until after we've gone past the Net9 RC2 deadline.

@dotnet/arm64-contrib

I'll do a spmidiff next.

kunalspathak · 2024-09-09T22:59:34Z

I expected this to be no diff changes, but looks like it is not. Can you please double check the source of differences?

a74nh · 2024-09-11T11:07:17Z

I expected this to be no diff changes, but looks like it is not. Can you please double check the source of differences?

This looks like it's all to LoadAndInsertScalar.

tldr: there is a bug in HEAD where getVectorAddrOperand() is not used for op3

Long version:

There are multiple versions of LoadAndInsertScalar because it needs to handle variants with multiple op1/return values
All of them have an address operand in op3

public static unsafe (Vector128<byte> Value1, Vector128<byte> Value2) LoadAndInsertScalar((Vector128<byte>, Vector128<byte>) values, [ConstantExpected(Max = (byte)(15))] byte index, byte* address);

public static unsafe Vector128<uint> LoadAndInsertScalar(Vector128<uint> value, [ConstantExpected(Max = (byte)(3))] byte index, uint* address);

In HEAD, for the multiple register versions, NI_AdvSimd_LoadAndInsertScalarVectorXxX, it has special handling:

else if (HWIntrinsicInfo::NeedsConsecutiveRegisters(intrin.id))
....
            case NI_AdvSimd_LoadAndInsertScalarVector64x2:
            case NI_AdvSimd_LoadAndInsertScalarVector64x3:
            case NI_AdvSimd_LoadAndInsertScalarVector64x4:
            case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x2:
            case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x3:
            case NI_AdvSimd_Arm64_LoadAndInsertScalarVector128x4:
            {
                assert(intrin.op2 != nullptr);
                assert(intrin.op3 != nullptr);
                assert(isRMW);
                if (!intrin.op2->isContainedIntOrIImmed())
                {
                    srcCount += BuildOperandUses(intrin.op2);
                }

                assert(intrinsicTree->OperIsMemoryLoadOrStore());
                srcCount += BuildAddrUses(intrin.op3);
                buildInternalRegisterUses();
                FALLTHROUGH;
            }

Note that BuildAddrUses() is used for op3

For the single register variant, NI_AdvSimd_LoadAndInsertScalar, it doesn't have NeedsConsecutiveRegisters so falls into the generic op2 handling code, before falling into the generic op3 handling code, which does:

        if (intrin.op3 != nullptr)
        {
            SingleTypeRegSet candidates = lowVectorOperandNum == 3 ? lowVectorCandidates : RBM_NONE;

            if (isRMW)
            {
                srcCount += BuildDelayFreeUses(intrin.op3, (tgtPrefOp2 ? intrin.op2 : intrin.op1), candidates);
            }
            else
            {
                srcCount += BuildOperandUses(intrin.op3, candidates);
            }

This is wrong - it should be using BuildAddrUses() for op3.

In my PR, getVectorAddrOperand() will correctly return op3 for all LoadAndInsertScalarVector and then the main for loop in BuildHWIntrinsic() will correctly call BuildAddrUses()

a74nh · 2024-09-11T12:12:20Z

tldr: there is a bug in HEAD where getVectorAddrOperand() is not used for op3

In addition, op1 will be called with BuildAddrUses(), which is also wrong.

That happens in:

else if (intrinsicTree->OperIsMemoryLoadOrStore())
        {
            srcCount += BuildAddrUses(intrin.op1);
        }

kunalspathak

Overall looks much cleaner. We should also run jitstress and other outerloop legs before merging. I can do it once we are done with addressing the feedback.

src/coreclr/jit/lsraarm64.cpp

kunalspathak · 2024-09-12T05:24:44Z

src/coreclr/jit/lsraarm64.cpp

+            assert(candidates == RBM_NONE);
+
+            // Some operands have consective op which is also a delay free op
+            srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);


We also seems to call buildInternalRegisterUses() for consecutive registers. Are we missing it here?

Also for NI_AdvSimd_VectorTableLookupExtension(), we call like this. We should double check that logic in new code.

BuildConsecutiveRegistersForUse

buildInternalRegisterUses

BuildDef

buildInternalRegisterUses

BuildDef

We also seems to call buildInternalRegisterUses() for consecutive registers. Are we missing it here?

All intrinsics will call buildInternalRegisterUses() after the for loop, before building the destination.

Also for NI_AdvSimd_VectorTableLookupExtension(), we call like this. We should double check that logic in new code.

BuildConsecutiveRegistersForUse

buildInternalRegisterUses

BuildDef

buildInternalRegisterUses

BuildDef

In old code:

op1 - BuildUse (because tgtPrefUse is set, which is because isRMW)

op2 - BuildConsecutiveRegistersForUse

op3 - BuildDelayFreeUses

buildInternalRegisterUses

BuildDef

In new code:

delay free = op1 (because isRMW)

addr = nullptr

consecutive = op2

dest consecutive = false

embedded = nullptr

BuildHWIntrinsicImmediate (which is a nop)

op1 - BuildUse (because delayFreeOp == op1)

op2 - BuildConsecutiveRegistersForUse (because consecutive == op2)

op3 - BuildDelayFreeUses (because delay free != nullptr)

buildInternalRegisterUses

BuildDef

You're missing the thing I always miss....

Also for NI_AdvSimd_VectorTableLookupExtension(), we call like this. We should double check that logic in new code.

BuildConsecutiveRegistersForUse

buildInternalRegisterUses

BuildDef

there is a return srcCount; here (line 1934)

Which means it never does:

buildInternalRegisterUses

BuildDef

a74nh · 2024-09-13T08:29:22Z

Got some asmdiffs for the SVE tests. Spotted two differences, and one of them is due to issues in HEAD.

I'll raise PRs to fix these (plus one for LoadAndInsertScalar), and then rebase this once merged. I'd like there to be no asmdiff differences in this PR

./4546.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_long RunClassFldScenario() this (FullOpts)
./4000.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AddSaturate_byte RunBasicScenario_Load() this (FullOpts)
./4130.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_sbyte RunClassFldScenario() this (FullOpts)
./27034.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_byte RunBasicScenario_Load() this (FullOpts)
./22619.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_sbyte RunClassFldScenario() this (FullOpts)
./22615.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_sbyte RunBasicScenario_Load() this (FullOpts)
./4170.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_int RunBasicScenario_Load() this (FullOpts)
./26818.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_int RunClassFldScenario() this (FullOpts)
./26730.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_uint RunClassFldScenario() this (FullOpts)
./4026.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AddSaturate_ushort RunClassFldScenario() this (FullOpts)
./22659.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_int RunBasicScenario_Load() this (FullOpts)
./4280.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_ulong RunBasicScenario_Load() this (FullOpts)
./4258.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_uint RunBasicScenario_Load() this (FullOpts)
./4192.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_long RunBasicScenario_Load() this (FullOpts)
./26726.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_uint RunBasicScenario_Load() this (FullOpts)
./26906.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_uint RunClassFldScenario() this (FullOpts)
./3481.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_long RunBasicScenario_Load() this (FullOpts)
./26642.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_int RunClassFldScenario() this (FullOpts)
./3547.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_uint RunBasicScenario_Load() this (FullOpts)
./4608.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_uint RunBasicScenario_Load() this (FullOpts)
./26946.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_sbyte RunBasicScenario_Load() this (FullOpts)
./27082.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_uint RunClassFldScenario() this (FullOpts)
./3529.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_ushort RunClassFldScenario() this (FullOpts)
./26968.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_short RunBasicScenario_Load() this (FullOpts)
./27012.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_long RunBasicScenario_Load() this (FullOpts)
./22681.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_long RunBasicScenario_Load() this (FullOpts)
./22769.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_ulong RunBasicScenario_Load() this (FullOpts)
./4218.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_byte RunClassFldScenario() this (FullOpts)
./27056.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_ushort RunBasicScenario_Load() this (FullOpts)
./4520.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_int RunBasicScenario_Load() this (FullOpts)
./26704.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_ushort RunBasicScenario_Load() this (FullOpts)
./26994.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_int RunClassFldScenario() this (FullOpts)
./4214.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_And_byte RunBasicScenario_Load() this (FullOpts)
./22747.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_uint RunBasicScenario_Load() this (FullOpts)
./22729.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_ushort RunClassFldScenario() this (FullOpts)
./26792.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_short RunBasicScenario_Load() this (FullOpts)
./3415.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_sbyte RunBasicScenario_Load() this (FullOpts)
./3441.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_short RunClassFldScenario() this (FullOpts)
./26554.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_float RunClassFldScenario() this (FullOpts)
./26902.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_uint RunBasicScenario_Load() this (FullOpts)
./4634.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_BitwiseClear_ulong RunClassFldScenario() this (FullOpts)
./22637.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_short RunBasicScenario_Load() this (FullOpts)
./3507.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_byte RunClassFldScenario() this (FullOpts)
./26616.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Subtract_short RunBasicScenario_Load() this (FullOpts)
./26880.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_ushort RunBasicScenario_Load() this (FullOpts)
./22707.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_byte RunClassFldScenario() this (FullOpts)
./22703.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_byte RunBasicScenario_Load() this (FullOpts)
./3459.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_int RunBasicScenario_Load() this (FullOpts)
./22641.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Or_short RunClassFldScenario() this (FullOpts)
./3503.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_byte RunBasicScenario_Load() this (FullOpts)
./26990.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_int RunBasicScenario_Load() this (FullOpts)
./3419.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Add_sbyte RunClassFldScenario() this (FullOpts)
./26814.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_SubtractSaturate_int RunBasicScenario_Load() this (FullOpts)
./27078.dasm:1 :   ; Assembly listing for method JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_Xor_uint RunBasicScenario_Load() this (FullOpts)

a74nh · 2024-09-13T14:07:37Z

Latest version fixes up the diffs there were caused by this PR.
Once #107786 and #107791 are merged there should be no remaining diffs in this PR.

kunalspathak · 2024-09-13T18:00:33Z

Latest version fixes up the diffs there were caused by this PR. Once #107786 and #107791 are merged there should be no remaining diffs in this PR.

Let's rebase this PR once the above mentioned PRs are merged to confirm there is zero asmdiff.

Change-Id: Id60f884b7281a9fae85a948a361511656c91357e

a74nh · 2024-09-26T16:17:43Z

Rebased on top of the other fixes. As mentioned in #107786, fixed it so that BuildDelayFreeUses() is only called for matching register types. Need to confirm that there are no spmi diffs

a74nh · 2024-09-27T08:43:22Z

No asm diffs now:

❯ python3 ./src/coreclr/scripts/superpmi.py collect $CORE_ROOT/corerun "./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll"
[08:27:14] ================ Logging to /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/superpmi.log
[08:27:14] SuperPMI collect
[08:27:14] SuperPMI JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so
[08:27:14] Collecting using command:
[08:27:14]   /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll
[08:30:53] Merging MC files
[08:30:57] Copy base MCH file to final MCH file
[08:31:08] Creating TOC file
[08:31:10] Generated MCH file: /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch

❯ python3 ./src/coreclr/scripts/superpmi.py asmdiffs -mch_files /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:35:57] ================ Logging to /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/superpmi.1.log
[08:35:57] Using JIT/EE Version from jiteeversionguid.h: b75a5475-ff22-4078-9551-2024ce03d383
[08:35:58] Baseline hash: cdc8418a7f4e51b771db2ae7ee5cde5f479cde7e
[08:35:58] Download: https://clrjit2.blob.core.windows.net/jitrollingbuild/builds/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb/linux/arm64/Checked/libclrjit.so -> /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
Downloading 5.6/5.6 MB...
[08:35:59] Downloaded https://clrjit2.blob.core.windows.net/jitrollingbuild/builds/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb/linux/arm64/Checked/libclrjit.so
[08:35:59] Using baseline /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
[08:35:59] Using coredistools found at /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libcoredistools.so
[08:35:59] SuperPMI ASM diffs
[08:35:59] Base JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/basejit/f1bcbeb5fa2fe84698b62d88dd35199f0d7fbedb.linux.arm64.Checked/libclrjit.so
[08:35:59] Diff JIT Path: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so
[08:35:59] Using MCH files:
[08:35:59]   /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:35:59] Running asm diffs of /home/alahay01/dotnet/runtime_sve_api/linux.arm64.Checked.mch
[08:36:39] Clean SuperPMI diff (72927 contexts processed)
[08:36:39] Asm diffs summary:
[08:36:39]   Summary Markdown file: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_summary.md
[08:36:39]   Short Summary Markdown file: /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_short_summary.md
[08:36:39]   No asm diffs

❯ cat /home/alahay01/dotnet/runtime_sve_api/artifacts/spmi/diff_summary.md
Diffs are based on <span style="color:#1460aa">72,927</span> contexts (<span style="color:#1460aa">1</span> MinOpts, <span style="color:#1460aa">72,926</span> FullOpts).

No diffs found.

<details>
<summary>Details</summary>
<div style="margin-left:1em">

#### Context information

|Collection|Diffed contexts|MinOpts|FullOpts|Missed, base|Missed, diff|
|---|--:|--:|--:|--:|--:|
|linux.arm64.Checked.mch|72,927|1|72,926|0 (0.00%)|0 (0.00%)|




</div></details>

a74nh added 14 commits September 6, 2024 10:30

Add BuildConditionalSelectHWIntrinsic()

2733b54

Add GetRMWOp()

9c75beb

Use GetDelayFreeOp() in BuildConditionalSelectWithEmbeddedOp()

bb908a5

simplify op2 handling

23e08ff

Add getVectorAddrOperand()

866bacd

Add getConsecutiveRegistersOperand

442fc1f

Add BuildOperand()

3ce31aa

Use BuildOperand for op1

323ce4b

Add buildHWIntrinsicImmediate

eaa535c

Add getOperandCandidates()

ec4efdb

Remove BuildOperand()

8caedea

remove delayFreeMultiple

99c53eb

Fixes from other PRs to be removed

1b30f92

Fix formatting

861831d

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 6, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 6, 2024

a74nh added 8 commits September 9, 2024 12:15

Use BuildHWIntrinsicImmediate for conditional select

686749e

Remove IsRMW

1f8ed58

Replace BuildConditionalSelectWithEmbeddedOp() with BuildEmbeddedOper…

2041f6f

…andUses()

Revert "Fixes from other PRs to be removed"

c27b446

Move functions

0d8cf9b

Move functions

cb99816

Remove failing unary tests

f5d34ac

Fix opNum type

357281e

a74nh marked this pull request as ready for review September 9, 2024 15:41

a74nh requested review from jakobbotsch and tannergooding September 9, 2024 15:41

Revert "Remove failing unary tests"

00c33e8

build-analysis bot mentioned this pull request Sep 11, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

a74nh added 2 commits September 11, 2024 12:14

Remove cases from getDelayFreeOperand that are handled by default

e58f641

Merge main

8682f0a

build-analysis bot mentioned this pull request Sep 11, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

kunalspathak requested changes Sep 12, 2024

View reviewed changes

review cleanups

faa608b

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Sep 12, 2024

This was referenced Sep 13, 2024

ARM64: Fix lsra for AdvSimd_LoadAndInsertScalar #107786

Merged

ARM64-SVE: Fix hwintrinsics flags #107791

Merged

a74nh added 3 commits September 13, 2024 14:58

Simplify masks in getOperandCandidates()

a4da945

Remove IsMaskedOperation()

3d9e997

Check for optional embedded masks in getDelayFreeOperand

0b45899

a74nh added the arch-arm64 label Sep 16, 2024

a74nh added 3 commits September 16, 2024 09:46

Merge main

bfece55

Change-Id: Id60f884b7281a9fae85a948a361511656c91357e

Merge main

e5cbb40

Only call BuildDelayFreeUses when register types match

7e3dd12

a74nh added 2 commits September 27, 2024 09:46

Merge main

f40079f

Assert only on Arm64

df8263b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64-SVE: refactor lsra buildHWIntrinsic #107459

ARM64-SVE: refactor lsra buildHWIntrinsic #107459

a74nh commented Sep 6, 2024 •

edited

Loading

a74nh commented Sep 6, 2024

a74nh commented Sep 9, 2024

kunalspathak commented Sep 9, 2024

a74nh commented Sep 11, 2024

a74nh commented Sep 11, 2024

kunalspathak left a comment

kunalspathak Sep 12, 2024

kunalspathak Sep 12, 2024

a74nh Sep 12, 2024

a74nh Sep 12, 2024

a74nh Sep 12, 2024

a74nh commented Sep 13, 2024

a74nh commented Sep 13, 2024

kunalspathak commented Sep 13, 2024

a74nh commented Sep 26, 2024 •

edited

Loading

a74nh commented Sep 27, 2024

ARM64-SVE: refactor lsra buildHWIntrinsic #107459

Are you sure you want to change the base?

ARM64-SVE: refactor lsra buildHWIntrinsic #107459

Conversation

a74nh commented Sep 6, 2024 • edited Loading

a74nh commented Sep 6, 2024

a74nh commented Sep 9, 2024

kunalspathak commented Sep 9, 2024

a74nh commented Sep 11, 2024

a74nh commented Sep 11, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak Sep 12, 2024

Choose a reason for hiding this comment

kunalspathak Sep 12, 2024

Choose a reason for hiding this comment

a74nh Sep 12, 2024

Choose a reason for hiding this comment

a74nh Sep 12, 2024

Choose a reason for hiding this comment

a74nh Sep 12, 2024

Choose a reason for hiding this comment

a74nh commented Sep 13, 2024

a74nh commented Sep 13, 2024

kunalspathak commented Sep 13, 2024

a74nh commented Sep 26, 2024 • edited Loading

a74nh commented Sep 27, 2024

a74nh commented Sep 6, 2024 •

edited

Loading

a74nh commented Sep 26, 2024 •

edited

Loading