Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Add AddSequentialAcross #104640

Merged
merged 6 commits into from
Jul 13, 2024

Conversation

amanasifkhalid
Copy link
Member

Follow-up to #104258, part of #99957. I'll do AddRotateComplex in a separate PR, since I think we might need to change the signature of that API if we want the user to pass the rotation constant in degrees: The type of the constant is currently byte, which is too small to handle a rotation amount of 270.

Test output:

Starting test: .\Core_Root\corerun.exe .\HardwareIntrinsics_Arm_r\HardwareIntrinsics_Arm_r.dll Sve_AddSequentialAcross
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_float() : 31
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_double() : 31
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

Starting test: .\Core_Root\corerun.exe .\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve_AddSequentialAcross
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_float() : 31
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_AddSequentialAcross_double() : 31
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@dotnet/arm64-contrib PTAL, thanks!

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jul 9, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@kunalspathak
Copy link
Member

I'll do AddRotateComplex in a separate PR, since I think we might need to change the signature of that API if we want the user to pass the rotation constant in degrees: The type of the constant is currently byte, which is too small to handle a rotation amount of 270.

Please take up this discussion in the original APi review issue and tag the reviewers. @tannergooding

double[] result = new double[op1.Length];
result[0] = op1[0];

foreach (double num in op2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only calculates the sum for active lanes, so you need to check that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

@@ -3121,6 +3121,9 @@
("SveVecBinOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_AddSaturate_uint", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "AddSaturate", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt32", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt32", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "UInt32", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt32()", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt32()", ["ValidateIterResult"] = "Helpers.AddSaturate(left[i], right[i]) != result[i]", ["GetIterResult"] = "Helpers.AddSaturate(left[i], right[i])"}),
("SveVecBinOpTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_AddSaturate_ulong", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "AddSaturate", ["RetVectorType"] = "Vector", ["RetBaseType"] = "UInt64", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "UInt64", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "UInt64", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetUInt64()", ["NextValueOp2"] = "TestLibrary.Generator.GetUInt64()", ["ValidateIterResult"] = "Helpers.AddSaturate(left[i], right[i]) != result[i]", ["GetIterResult"] = "Helpers.AddSaturate(left[i], right[i])"}),

("SveVecBinOpVecTest.template", new Dictionary<string, string> { ["TestName"] = "Sve_AddSequentialAcross_float", ["Isa"] = "Sve", ["LoadIsa"] = "Sve", ["Method"] = "AddSequentialAcross", ["RetVectorType"] = "Vector", ["RetBaseType"] = "Single", ["Op1VectorType"] = "Vector", ["Op1BaseType"] = "Single", ["Op2VectorType"] = "Vector", ["Op2BaseType"] = "Single", ["LargestVectorSize"] = "64", ["NextValueOp1"] = "TestLibrary.Generator.GetSingle()", ["NextValueOp2"] = "TestLibrary.Generator.GetSingle()", ["ValidateVectorResult"] = "!result.SequenceEqual(Helpers.AddSequentialAcross(left, right))", ["GetVectorResult"] = "Helpers.AddSequentialAcross(left, right)"}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test the operation with ConditionalSelect as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, using SimpleVecOpTest_VectorValidationLogicForCndSel and SimpleVecOpTest_VectorValidationLogicForCndSel_FalseValue. I guess I've been getting lucky with mask[0] being 0 for the CndSel scenarios that skip lanes in the second op.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use Helpers.getMask*() to produce the mask input.

Copy link
Member Author

@amanasifkhalid amanasifkhalid Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I tweaked the test definitions to use getMask* for NextValueOp1. The only odd thing about this approach is the template for SveVecBinOpVecTest uses NextValueOp1 for generating the mask input, and for op1's input, so the test values for op1 are artificially constrained by this approach. I can tweak the test template to use separate template variables for these two, but lots of existing tests would need to be updated, and I don't think the value gained from doing this is all that much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with not doing it in this PR, but can you please open an issue to update the templates to have them separate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing: #104804

@amanasifkhalid
Copy link
Member Author

@kunalspathak thanks for the review -- I've fixed the helpers to skip inactive lanes. Stress tests are still passing.

@kunalspathak
Copy link
Member

I've fixed the helpers to skip inactive lanes. Stress tests are still passing.

If the tests were passing without fixing for inactive lanes, that means the mask input given was always making all lanes active. Can you please double check if the mask input produced is generating 0 and 1 both? There are helpers like Helpers.getMask*() to produce them.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak kunalspathak self-requested a review July 12, 2024 15:36
@kunalspathak
Copy link
Member

It occured to me that this instruction produces scalar results, so if you combine it with ConditionalSelect, which produces results in scalable register, you might be testing the results in all lanes, which will be wrong. You might just need to verify the results in lane 0.

fadda v9, p0, v1, z2.d
sel z8, p0, z9, z7

Above, the results are only present in lane 0 of v9 / z9. So you might want to verify just for that lane. All the other lanes will hold the value that was there before performing fadda.

@amanasifkhalid
Copy link
Member Author

After discussing offline with @kunalspathak, we've decided to match AddAcross for now and skip the CndSel scenarios for this API, as there's some ambiguity as to what the shape of these scenarios should look like. See #101770, and #101974 for updating these APIs with explicitly-masked variants.

@amanasifkhalid
Copy link
Member Author

I've removed the CndSel scenarios for now. Stress tests are passing.

@amanasifkhalid
Copy link
Member Author

@kunalspathak are you ok with me merging this as-is?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak kunalspathak merged commit 5336e18 into dotnet:main Jul 13, 2024
149 of 167 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-add-sequential branch July 15, 2024 14:03
@github-actions github-actions bot locked and limited conversation to collaborators Aug 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants