Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the last two SIMDIntrinsic in LIR with NamedIntrinsic and delete GT_SIMD #80027

Merged
merged 4 commits into from
Jan 5, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Dec 28, 2022

As per the title this moves the last two remaining SIMDIntrinsic cases to be NamedIntrinsic and thus allows us to remove GT_SIMD.

Unlike most of the other SIMD intrinsics, UpperSave and UpperRestore are "special" and exist where general SIMD support does for ABI reasons. However, GT_INTRINSIC is a large node and has additional cost that GT_SIMD did not. Because of this, we use GT_HWINTRINSIC when it is available and only fallback to GT_INTRINSIC when it is not.

It's possible we could merge FEATURE_SIMD and FEATURE_HW_INTRINSIC together, given how tightly coupled they otherwise are, but that would be a more complex work item and should be handled separately if at all. Doing so would allow us to always use GT_HWINTRINSIC.

Likewise, we have both GenTreeIntrinsic (GT_INTRINSIC) and GenTreeJitIntrinsic (which is only GenTreeHWIntrinsic/GT_HWINTRINSIC now) and these should probably be modified a bit to have a common base, but that is a larger refactoring and should be done separately if at all. -- It's a bit more complicated since moving fields around adds padding between the fields of the base and derived types and that can push things to become a large node unnecessarily.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 28, 2022
@ghost ghost assigned tannergooding Dec 28, 2022
@ghost
Copy link

ghost commented Dec 28, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

As per the title this moves the last two remaining SIMDIntrinsic cases to be NamedIntrinsic and thus allows us to remove GT_SIMD.

Unlike most of the other SIMD intrinsics, UpperSave and UpperRestore are "special" and exist where general SIMD support does for ABI reasons.

It's possible we could merge FEATURE_SIMD and FEATURE_HW_INTRINSIC together, given how tightly coupled they otherwise are, but that would be a more complex work item and should be handled separately if at all.

Likewise, we have both GenTreeIntrinsic (GT_INTRINSIC) and GenTreeJitIntrinsic (which is only GenTreeHWIntrinsic/GT_HWINTRINSIC now) and these should probably be modified a bit to have a common base, but that is a larger refactoring and should be done separately if at all.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@@ -285,11 +285,11 @@ void CodeGen::genPutArgStkSIMD12(GenTree* treeNode)
#endif // TARGET_X86

Copy link
Member Author

@tannergooding tannergooding Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdcodegenxarch.cpp is now "unique" and should probably just be merged into codegenxarch.cpp, just like the related code already is for Arm64 (there probably isn't "enough" here to warrant it being in its own separate file anymore)..

@tannergooding tannergooding marked this pull request as ready for review December 28, 2022 23:13
@tannergooding
Copy link
Member Author

tannergooding commented Dec 28, 2022

CC. @dotnet/jit-contrib. This should be ready for review, no diffs.

Probably need to determine why arm64 shows a TP regression (it is a TP improvement on x86/x64). Arm64 was triggering an assert related to gtSetEvalOrder so its possible something subtly different being caused there...

If I had to guess it has to do with the old logic setting SetOpLclRelatedToSIMDIntrinsic(op1); (via gtNewSIMDNode), but I'll take a closer look.

@tannergooding
Copy link
Member Author

Hmmm, the Arm64 TP regression looks to actually be because GT_SIMD was a small node but GT_INTRINSIC is "large". Arm64 ends up getting quite a bit more UpperSave/Restore happening than x64 and that is impactful.

@tannergooding
Copy link
Member Author

tannergooding commented Dec 31, 2022

Not sure what's causing the TP regression for Arm64 after all.

There remained a 0.01% TP regression even after adjusting to use GT_HWINTRINSIC so we had a small node. Nothing else obvious pops out from the inscount trace and we are, by all means, executing less code on many core paths now regardless.

Likely still worth taking given the TP improvement for x86/x64, the removal of an unnecessary node type, and general simplification in the JIT.

@tannergooding
Copy link
Member Author

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice to see all the red!

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's causing the TP regression for Arm64 after all.

Yes, that is little puzzling, and it is consistent across OS. Did you try running TP manually to see which method has more instructions?

nice to see all the red!

Indeed.

@tannergooding
Copy link
Member Author

Yes, that is little puzzling, and it is consistent across OS. Did you try running TP manually to see which method has more instructions?

Yes, but unfortunately the change in node type makes this difficult to determine.

There are many functions where the ins_count has changed (some positive, some negative). The numbers were overall similar to x64, but in larger amounts since Arm64 creates more UpperSave/Restore nodes.

@tannergooding
Copy link
Member Author

tannergooding commented Jan 5, 2023

Single stress failure is unrelated: #75244

@tannergooding tannergooding merged commit 83261a9 into dotnet:main Jan 5, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants