Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retype calls as SIMD when applicable. #53578

Merged
merged 7 commits into from
Jun 4, 2021
Merged

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Jun 2, 2021

Retype calls that return System.Numerics.Vector2/3/4 as TYP_SIMD8/12/16 between importation and lowering.

It better aligns with the fact that we retype System.Numerics.Vector2/3/4 locals and returns as SIMD.

some IR diffs:

STMT00007 (IL 0x02B...  ???)
               [000027] --C-G-------              *  RETURN    simd8 
-              [000026] --C-G-------              \--*  CALL      struct Runtime_49489.Program.ReturnSIMD2
+              [000026] --C-G-------              \--*  CALL      simd8  Runtime_49489.Program.ReturnSIMD2

STMT00011 (IL 0x003...0x008)
               [000040] -ACXG-------              *  ASG       simd8  (copy)
               [000038] D------N----              +--*  LCL_VAR   simd8 <System.Numerics.Vector2> V03 tmp1         
-              [000034] --CXG+-N----              \--*  CALL      struct Runtime_49489.Program.ReturnSIMD2
+              [000034] --CXG+-N----              \--*  CALL      simd8  Runtime_49489.Program.ReturnSIMD2

I am not a huge fan of this retyping but the improved check in gtNewLclvNode should make the code overall safer in my opinion.

No asm diffs.

Fixes #52864.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2021
@sandreenko sandreenko marked this pull request as ready for review June 2, 2021 22:54
@sandreenko
Copy link
Contributor Author

PTAL @AndyAyersMS @dotnet/jit-contrib

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sandreenko
Copy link
Contributor Author

I have pushed a change to retype multireg returns as well, note that now the change has some diffs but they are potential bug fixes:

Generating: N065 ( 18,  3) [000012] --CXG-------        t12 = *  CALL ind unman struct REG mm0,mm1 $280
+IN0014:        vpslldq  xmm1, 12
+IN0015:        vpsrldq  xmm1, 12

                                                              /--*  t12    struct 
Generating: N081 ( 18,  3) [000015] DA-XG-------              *  STORE_LCL_VAR simd16<System.Numerics.Vector3> V06 tmp3         d:1 mm0 REG mm0
IN001c:        vshufpd  xmm0, xmm1, 0

as we see from

// A Vector3 return value is stored in xmm0 and xmm1.
// RyuJIT assumes that the upper unused bits of xmm1 are cleared but
// the native compiler doesn't guarantee it.
if (call->IsUnmanaged() && (returnType == TYP_SIMD12))
{

there were no guarantees that top bytes of xmm1 were zeroed by the native compiler.

diffs:

tests.pmi.Linux.arm64.checked.mch:
Total bytes of delta: 24, 
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

coreclr_tests.pmi.Linux.x64.checked.mch
Total bytes of delta: 20
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

tests.pmi.windows.arm64.checked.mch
Total bytes of delta: 24
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

all in

Top method regressions (bytes):
          10 ( 0.40% of base) : 229418.dasm - PInvokeTest:test():bool
          10 ( 5.88% of base) : 229416.dasm - ILStubClass:IL_STUB_PInvoke():System.Numerics.Vector3

cc @tannergooding

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sandreenko sandreenko merged commit 321217f into dotnet:main Jun 4, 2021
@@ -5414,7 +5414,7 @@ void CodeGen::genCallInstruction(GenTreeCall* call)
// A Vector3 return value is stored in xmm0 and xmm1.
// RyuJIT assumes that the upper unused bits of xmm1 are cleared but
// the native compiler doesn't guarantee it.
if (returnType == TYP_SIMD12)
if (call->IsUnmanaged() && (returnType == TYP_SIMD12))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, the shift left, shift right idiom isn't the "best" codegen for modern hardware. It would likely be better for us to use one of the zero insertion idioms instead.

I'll log a bug for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logged #53713

@sandreenko sandreenko deleted the GitHub_52864 branch June 4, 2021 06:49
@ghost ghost locked as resolved and limited conversation to collaborators Jul 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guarded devirtualization assert with vector type
3 participants