Retype calls as SIMD when applicable. #53578

sandreenko · 2021-06-02T06:25:22Z

Retype calls that return System.Numerics.Vector2/3/4 as TYP_SIMD8/12/16 between importation and lowering.

It better aligns with the fact that we retype System.Numerics.Vector2/3/4 locals and returns as SIMD.

some IR diffs:

STMT00007 (IL 0x02B...  ???)
               [000027] --C-G-------              *  RETURN    simd8 
-              [000026] --C-G-------              \--*  CALL      struct Runtime_49489.Program.ReturnSIMD2
+              [000026] --C-G-------              \--*  CALL      simd8  Runtime_49489.Program.ReturnSIMD2

STMT00011 (IL 0x003...0x008)
               [000040] -ACXG-------              *  ASG       simd8  (copy)
               [000038] D------N----              +--*  LCL_VAR   simd8 <System.Numerics.Vector2> V03 tmp1         
-              [000034] --CXG+-N----              \--*  CALL      struct Runtime_49489.Program.ReturnSIMD2
+              [000034] --CXG+-N----              \--*  CALL      simd8  Runtime_49489.Program.ReturnSIMD2

I am not a huge fan of this retyping but the improved check in gtNewLclvNode should make the code overall safer in my opinion.

No asm diffs.

Fixes #52864.

sandreenko · 2021-06-02T23:15:46Z

PTAL @AndyAyersMS @dotnet/jit-contrib

sandreenko · 2021-06-02T23:16:33Z

/azp run runtime-coreclr jitstress

azure-pipelines · 2021-06-02T23:16:48Z

Azure Pipelines successfully started running 1 pipeline(s).

sandreenko · 2021-06-03T22:30:49Z

I have pushed a change to retype multireg returns as well, note that now the change has some diffs but they are potential bug fixes:

Generating: N065 ( 18,  3) [000012] --CXG-------        t12 = *  CALL ind unman struct REG mm0,mm1 $280
+IN0014:        vpslldq  xmm1, 12
+IN0015:        vpsrldq  xmm1, 12

                                                              /--*  t12    struct 
Generating: N081 ( 18,  3) [000015] DA-XG-------              *  STORE_LCL_VAR simd16<System.Numerics.Vector3> V06 tmp3         d:1 mm0 REG mm0
IN001c:        vshufpd  xmm0, xmm1, 0

as we see from

runtime/src/coreclr/jit/codegenxarch.cpp

Lines 5414 to 5418 in 2c508d4

    
           // A Vector3 return value is stored in xmm0 and xmm1. 
        
           // RyuJIT assumes that the upper unused bits of xmm1 are cleared but 
        
           // the native compiler doesn't guarantee it. 
        
           if (call->IsUnmanaged() && (returnType == TYP_SIMD12)) 
        
           {

there were no guarantees that top bytes of xmm1 were zeroed by the native compiler.

diffs:

tests.pmi.Linux.arm64.checked.mch:
Total bytes of delta: 24, 
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

coreclr_tests.pmi.Linux.x64.checked.mch
Total bytes of delta: 20
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

tests.pmi.windows.arm64.checked.mch
Total bytes of delta: 24
2 total files with Code Size differences (0 improved, 2 regressed), 0 unchanged.

all in

Top method regressions (bytes):
          10 ( 0.40% of base) : 229418.dasm - PInvokeTest:test():bool
          10 ( 5.88% of base) : 229416.dasm - ILStubClass:IL_STUB_PInvoke():System.Numerics.Vector3

cc @tannergooding

sandreenko · 2021-06-03T22:30:57Z

/azp run runtime-coreclr jitstress

azure-pipelines · 2021-06-03T22:31:12Z

Azure Pipelines successfully started running 1 pipeline(s).

tannergooding · 2021-06-04T04:35:14Z

src/coreclr/jit/codegenxarch.cpp

@@ -5414,7 +5414,7 @@ void CodeGen::genCallInstruction(GenTreeCall* call)
                // A Vector3 return value is stored in xmm0 and xmm1.
                // RyuJIT assumes that the upper unused bits of xmm1 are cleared but
                // the native compiler doesn't guarantee it.
-                if (returnType == TYP_SIMD12)
+                if (call->IsUnmanaged() && (returnType == TYP_SIMD12))


Just noting, the shift left, shift right idiom isn't the "best" codegen for modern hardware. It would likely be better for us to use one of the zero insertion idioms instead.

I'll log a bug for this.

Logged #53713

Sergey added 4 commits June 1, 2021 18:10

add a repro

375e775

passed spmi.

e8e6427

update comment.

0fe2f0a

update the test

7192ed2

sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2021

improve the check.

864f21e

sandreenko marked this pull request as ready for review June 2, 2021 22:54

AndyAyersMS approved these changes Jun 3, 2021

View reviewed changes

Sergey added 2 commits June 3, 2021 12:00

fix a stressfailure

0f67527

fix x64 unix diff

2c508d4

sandreenko merged commit 321217f into dotnet:main Jun 4, 2021

tannergooding reviewed Jun 4, 2021

View reviewed changes

sandreenko deleted the GitHub_52864 branch June 4, 2021 06:49

ghost locked as resolved and limited conversation to collaborators Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retype calls as SIMD when applicable. #53578

Retype calls as SIMD when applicable. #53578

sandreenko commented Jun 2, 2021 •

edited

Loading

sandreenko commented Jun 2, 2021

sandreenko commented Jun 2, 2021

azure-pipelines bot commented Jun 2, 2021

sandreenko commented Jun 3, 2021

sandreenko commented Jun 3, 2021

azure-pipelines bot commented Jun 3, 2021

tannergooding Jun 4, 2021

tannergooding Jun 4, 2021

Retype calls as SIMD when applicable. #53578

Retype calls as SIMD when applicable. #53578

Conversation

sandreenko commented Jun 2, 2021 • edited Loading

sandreenko commented Jun 2, 2021

sandreenko commented Jun 2, 2021

azure-pipelines bot commented Jun 2, 2021

sandreenko commented Jun 3, 2021

sandreenko commented Jun 3, 2021

azure-pipelines bot commented Jun 3, 2021

tannergooding Jun 4, 2021

Choose a reason for hiding this comment

tannergooding Jun 4, 2021

Choose a reason for hiding this comment

sandreenko commented Jun 2, 2021 •

edited

Loading