-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64-SVE: refactor lsra buildHWIntrinsic #107459
base: main
Are you sure you want to change the base?
Conversation
@kunalspathak - Still WIP. For now ignore anything outside of |
This PR is ready now. Requires #107084, #107180 and a workaround for #107537 in order for all the hwintrinsic tests to pass. Apologies, this is a large change to review, and the github diff is confused about functions I haven't touched. Probably best starting a review from the new version of I recommend this is not merged until after we've gone past the Net9 RC2 deadline. @dotnet/arm64-contrib I'll do a spmidiff next. |
I expected this to be no diff changes, but looks like it is not. Can you please double check the source of differences? |
This looks like it's all to tldr: there is a bug in HEAD where Long version: There are multiple versions of
In HEAD, for the multiple register versions,
Note that For the single register variant,
This is wrong - it should be using In my PR, |
In addition, That happens in:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks much cleaner. We should also run jitstress
and other outerloop legs before merging. I can do it once we are done with addressing the feedback.
assert(candidates == RBM_NONE); | ||
|
||
// Some operands have consective op which is also a delay free op | ||
srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also seems to call buildInternalRegisterUses()
for consecutive registers. Are we missing it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
- buildInternalRegisterUses
- BuildDef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also seems to call
buildInternalRegisterUses()
for consecutive registers. Are we missing it here?
All intrinsics will call buildInternalRegisterUses()
after the for
loop, before building the destination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for
NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
- buildInternalRegisterUses
- BuildDef
In old code:
- op1 - BuildUse (because tgtPrefUse is set, which is because isRMW)
- op2 - BuildConsecutiveRegistersForUse
- op3 - BuildDelayFreeUses
- buildInternalRegisterUses
- BuildDef
In new code:
- delay free = op1 (because isRMW)
- addr = nullptr
- consecutive = op2
- dest consecutive = false
- embedded = nullptr
- BuildHWIntrinsicImmediate (which is a nop)
- op1 - BuildUse (because delayFreeOp == op1)
- op2 - BuildConsecutiveRegistersForUse (because consecutive == op2)
- op3 - BuildDelayFreeUses (because delay free != nullptr)
- buildInternalRegisterUses
- BuildDef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing the thing I always miss....
Also for
NI_AdvSimd_VectorTableLookupExtension()
, we call like this. We should double check that logic in new code.
- BuildConsecutiveRegistersForUse
- buildInternalRegisterUses
- BuildDef
there is a return srcCount;
here (line 1934)
Which means it never does:
- buildInternalRegisterUses
- BuildDef
Got some asmdiffs for the SVE tests. Spotted two differences, and one of them is due to issues in HEAD. I'll raise PRs to fix these (plus one for LoadAndInsertScalar), and then rebase this once merged. I'd like there to be no asmdiff differences in this PR
|
Change-Id: Id60f884b7281a9fae85a948a361511656c91357e
Rebased on top of the other fixes. As mentioned in #107786, fixed it so that |
No asm diffs now:
|
Fixes #104842
The logic for hwintrisics has become convoluted. Refactor it, for both SVE and AdvSimd.
Add functions to get the operand (if any) for each requirement - delay slot, consecutive registers, address, etc.
Then use a simple for loop to iterate through each operand and build depending on which requirements match for that operand.
Tested by using stress_test.py on the entire HardwareIntrinsics_Arm set.