-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Shift and Inserts scalar and SIMD intrinsics. #36818
Implement Shift and Inserts scalar and SIMD intrinsics. #36818
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
@echesakovMSFT @tannergooding I'm trying to figure out why my tests are throwing
I know my constants aren't compile time constants, but I thought it would generate a lookup table then? |
case NI_AdvSimd_ShiftLeftLogicalAndInsertScalar: | ||
case NI_AdvSimd_ShiftRightAndInsert: | ||
case NI_AdvSimd_ShiftRightLogicalAndInsertScalar: | ||
immLowerBound = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immLowerBound == 0
is a MOVI
variant? (Trying to make sure I'm reading the architecture manual correctly here 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah 0
would be a MOV
or ORR
(not MOVI
since that's a bitmask).
@TamarChristinaArm You would also need something like this echesakov@0fe0cf9 to make sure that in non-const imm case we are checking against lower bound. Would you mind waiting a little bit before I merge #36552 and changes in my private branch https://github.com/echesakovMSFT/runtime/compare/Arm64-ASIMD-Shift-Intrinsics that I plan to publish today. #36552 is needed to fix an issue in emitter where INS_sri was not encoding shift amount properly (see #36552 (comment)) and the changes in private will add the check in addRangeCheckIfNeeded. |
@echesakovMSFT ahh thanks, No I don't mind waiting at all. I was just really confused why it wasn't working :) |
I have one question - in #33398 we decided to explicitly specify what type of right shifts an intrinsic does - logical right shift or arithmetic right shift. The operations in this PR are slightly different though - they are neither logical nor arithmetic since the bits "created" by shifting are filled with the corresponding bits in Vd. Would it make sense to name them ShiftLeftAndInsert ShiftLeftAndInsertScalar ShiftRightAndInsert ShiftRightAndInsertScalar? @TamarChristinaArm @tannergooding |
I do agree here. When implementing them I did find the naming a bit confusing and inconsistent. But since they were approved like that I went with it. I find your proposed name clearer since as you say regardless of whether the shift is logical or arithmetic the bits you shift in will be overwritten. |
That makes sense. This is also an element shift rather than a bitwise shift, so we might consider a different parameter name to make that clear. That is, if you right shift by |
It's a bitwise shift though, but every element is shifted by the same amount. I have also just realized that this is a RMW and forgot to modify it as such.. |
Ah, I misread the diagram in the manual. I see now that the boxes are for the individual bits, my mistake. In any case, we'd just need to take this to API review for discussion. |
@TamarChristinaArm I merged #36552 and #36830 last week, so sri and sli instructions should work as expected and all the infrastructure needed for non-zero lower bound intrinsics are in place. Please let me know if you need help with resolving the merge conflicts. |
@echesakovMSFT awesome thanks! I'm rebasing it now, will let you know if I run into any trouble. |
cf57e2a
to
ce29c95
Compare
That should do it @echesakovMSFT @tannergooding @CarolEidt |
aside from the naming issue @echesakovMSFT raised in #36818 (comment) but @tannergooding mentioned it needs API review, so I have left them as the last reviewed names. |
ce29c95
to
3fcae9a
Compare
rebased with changes from master |
@@ -7442,6 +7442,230 @@ public new abstract class Arm64 : ArmBase.Arm64 | |||
/// </summary> | |||
public static Vector128<float> ReciprocalStep(Vector128<float> left, Vector128<float> right) => ReciprocalStep(left, right); | |||
|
|||
/// <summary> | |||
/// uint8x8_t vsli_n_u8(uint8x8_t a, uint8x8_t b, __builtin_constant_p(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TamarChristinaArm Everything looks good.
Just one question here - what is __builtin_constant_p(n)
and
should we use uint8x8_t vsri_n_u8 (uint8x8_t a, uint8x8_t b, const int n)
comment here instead as we did for the remaining intrinsics with an immediate operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question here - what is
__builtin_constant_p(n)
and
It's a compiler builtin supported by clang
and gcc
to indicate that the value of n
must be a compile time constant. the problem is that in C there's no real way to indicate this as const
only means the value is read-only in the function.
should we use
uint8x8_t vsri_n_u8 (uint8x8_t a, uint8x8_t b, const int n)
comment here instead as we did for the remaining intrinsics with an immediate operand?
probably yes since in CLR it doesn't have to be since you generate that lookup table. I generate these signatures through a script that reads the ACLE sources so they got in automatically :)
I am going to merge this to avoid future conflicts (e.g. with #36916) If we decide to update the summary comments/method names we can do this later. |
Works for us. |
This implements the remaining shift and insert intrinsics.
Fixes #31324
/cc @tannergooding @echesakovMSFT @CarolEidt