Adjust impNormStructVal to not wrap in GT_OBJ #81636

tannergooding · 2023-02-04T14:38:57Z

The ability to make this change comes down to months of work from @SingleAccretion to teach the rest of the JIT about struct typed locals.

Given the breadth of the impact here, that this is a pure improvement to minopts (no diff + significant TP improvement), and that for fullopts the improvement bytes is over 4x that of the regression bytes, I think this is worth taking and getting in depth perf numbers as part of perf triage which let us better understand where to focus efforts around handling said regressions (or inversely, if we should scope the full-opts scenario down).

ghost · 2023-02-04T14:39:09Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-02-04T17:56:01Z

Edit: Updated diff comment is here: #81636 (comment)

diffs

Windows x64 and Arm64 are the two largest improvements with Windows Arm64 being -300k bytes. Linux and OSX x64/Arm64 are closer to Windows Arm64 being around -140k bytes (sometimes bit more, sometimes bit less).

There are no diffs for minopts in any context. There is however a TP improvement in all contexts with minopts being around -0.07% and fullopts being around -0.34%

Will post some analysis on the "improvements" and "regressions" shortly.

Windows x64

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.windows.x64.checked.mch	461	181	128	152	-4,599	+4,932
coreclr_tests.run.windows.x64.checked.mch	1,604	796	306	502	-56,089	+10,462
libraries.crossgen2.windows.x64.checked.mch	3,185	2,405	223	557	-62,815	+6,811
libraries.pmi.windows.x64.checked.mch	6,105	4,124	777	1,204	-131,974	+20,930
libraries_tests.pmi.windows.x64.checked.mch	8,299	5,587	1,414	1,298	-232,310	+72,586
	19,654	13,093	2,848	3,713	-487,787	+115,721

Windows Arm64

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.windows.arm64.checked.mch	395	127	69	199	-2,392	+2,420
coreclr_tests.run.windows.arm64.checked.mch	2,036	706	934	396	-47,312	+11,764
libraries.crossgen2.windows.arm64.checked.mch	1,918	890	101	927	-23,040	+1,760
libraries.pmi.windows.arm64.checked.mch	3,986	1,862	517	1,607	-42,412	+12,800
libraries_tests.pmi.windows.arm64.checked.mch	7,608	4,270	1,534	1,804	-116,260	+64,444
	15,943	7,855	3,155	4,933	-231,416	+93,188

tannergooding · 2023-02-04T18:14:05Z

Going to start with a simple example:

public bool Equals(UInt128 other)
{
    return this == other;
}

public static bool operator ==(UInt128 left, UInt128 right)
{
    return (left._lower == right._lower)
        && (left._upper == right._upper);
}

In this case we previously had these locals and are able to now elide an "inlining arg" and therefore extra promotion (of course the actual diff is slightly different due to lclnum's changing):

 ;  V00 this         [V00,T01] (  4,  4   )   byref  ->  rcx         this single-def
 ;  V01 arg1         [V01,T00] (  4,  8   )   byref  ->  rdx         single-def
 ;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
 ;  V03 tmp1         [V03,T02] (  3,  2   )    bool  ->  rax         "Inline return value spill temp"
 ;* V04 tmp2         [V04    ] (  0,  0   )  struct (16) zero-ref    "Inlining Arg"
-;* V05 tmp3         [V05    ] (  0,  0   )  struct (16) zero-ref    "Inlining Arg"
 ;* V06 tmp4         [V06    ] (  0,  0   )    long  ->  zero-ref    V12._lower(offs=0x00) P-INDEP "field V01._lower (fldOffset=0x0)"
 ;* V07 tmp5         [V07    ] (  0,  0   )    long  ->  zero-ref    V12._upper(offs=0x08) P-INDEP "field V01._upper (fldOffset=0x8)"
 ;  V08 tmp6         [V08,T03] (  2,  2   )    long  ->  rax         V04._lower(offs=0x00) P-INDEP "field V04._lower (fldOffset=0x0)"
 ;  V09 tmp7         [V09,T05] (  2,  1.50)    long  ->  rcx         V04._upper(offs=0x08) P-INDEP "field V04._upper (fldOffset=0x8)"
-;  V10 tmp8         [V10,T04] (  2,  2   )    long  ->   r8         V05._lower(offs=0x00) P-INDEP "field V05._lower (fldOffset=0x0)"
-;  V11 tmp9         [V11,T06] (  2,  1.50)    long  ->  rdx         V05._upper(offs=0x08) P-INDEP "field V05._upper (fldOffset=0x8)"
 ;* V12 tmp10        [V12    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"

This allows us to access the underlying data directly rather than pre-loading it into a register:

-       mov      r8, qword ptr [rdx]
-       mov      rdx, qword ptr [rdx+08H]
-       ; byrRegs -[rdx]
-       cmp      rax, r8
+       cmp      rax, qword ptr [rdx]
        ...
        xor      eax, eax
-       cmp      rcx, rdx
+       cmp      rcx, qword ptr [rdx+08H]
        ...
+       ; byrRegs -[rdx]

These cases look to be the bulk of many diffs, simply removing "inlining arg" and therefore removing an unnecessary copy.

tannergooding · 2023-02-04T18:22:02Z

We have some cases like this:

private readonly SegmentedHashSet<T> _set

public bool Contains(T value) => _set.Contains(value);

Where given T being Vector<float> the dropped "Inlining Arg" is:

-;  V04 tmp2         [V04    ] (  2,  4   )  simd32  ->  [rsp+20H]   do-not-enreg[XS] addr-exposed "Inlining Arg"

This avoidance of a copy means we no longer have SIMD code, which means we no long emit vzeroupper:

-       sub      rsp, 88
-       vzeroupper 
+       sub      rsp, 40

We likewise then no longer create a copy and don't need to spill said copy for the method we're forwarding to:

-       vmovups  ymm0, ymmword ptr[rdx]
-       vmovups  ymmword ptr[rsp+20H], ymm0
        cmp      byte  ptr [rcx], cl
-       lea      rdx, [rsp+20H]
-       ; byrRegs -[rdx]
        call     [<unknown method>]
        ; gcrRegs -[rcx]
+       ; byrRegs -[rdx]

This is essentially just a variation on the first and again applies to many cases where we're simply passing along a value type across the inlining boundary.

We see similar improvements for Nullable<T> and effectively any value type that is passed via a byref to a shadow copy.

tannergooding · 2023-02-04T18:41:41Z

For a regression case:

public unsafe Matrix3x2 TransformElements
{
    set
    {
        IntPtr nativeMatrix = Matrix.CreateNativeHandle(value);

        try
        {
            Gdip.CheckStatus(Gdip.GdipSetWorldTransform(
                new HandleRef(this, NativeGraphics), new HandleRef(null, nativeMatrix)));
        }
        finally
        {
            if (nativeMatrix != IntPtr.Zero)
            {
                Gdip.GdipDeleteMatrix(new HandleRef(null, nativeMatrix));
            }
        }
    }
}

internal static void CheckStatus(int status)
{
    if (status != Ok)
        throw StatusException(status);
}

// Other calls are Library import based P/Invokes

We are able to drop a do-not-enreg inlining arg again:

-;  V08 tmp5         [V08,T00] (  7, 14   )  struct (24) [rbp-58H]   do-not-enreg[SF] "Inlining Arg"

This in turn means that other inlining args are no longer zero-ref:

-;* V13 tmp10        [V13    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V14 tmp11        [V14    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V15 tmp12        [V15    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V16 tmp13        [V16    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V17 tmp14        [V17    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V18 tmp15        [V18    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
+;  V12 tmp9         [V12,T19] (  2,  4   )   float  ->  mm6         "Inlining Arg"
+;  V13 tmp10        [V13,T20] (  2,  4   )   float  ->  mm7         "Inlining Arg"
+;  V14 tmp11        [V14,T21] (  2,  4   )   float  ->  mm8         "Inlining Arg"
+;  V15 tmp12        [V15,T22] (  2,  4   )   float  ->  mm9         "Inlining Arg"
+;  V16 tmp13        [V16,T23] (  2,  4   )   float  ->  mm10         "Inlining Arg"
+;  V17 tmp14        [V17,T24] (  2,  4   )   float  ->  mm11         "Inlining Arg"

This is because previously we were doing an efficient block copy of the Matrix3x2, but are now loading the 6 fields directly. This causes more spilling in the prologue/epilogue as they are caller saved registers and the loss of containment:

-       vmovdqu  xmm0, xmmword ptr [rsi]
-       vmovdqu  xmmword ptr [rbp-58H], xmm0
-       mov      rcx, qword ptr [rsi+10H]
-       mov      qword ptr [rbp-48H], rcx
+       vmovss   xmm6, dword ptr [rsi]
+       vmovss   xmm7, dword ptr [rsi+04H]
+       vmovss   xmm8, dword ptr [rsi+08H]
+       vmovss   xmm9, dword ptr [rsi+0CH]
+       vmovss   xmm10, dword ptr [rsi+10H]
+       vmovss   xmm11, dword ptr [rsi+14H]
       mov      rcx, 0xD1FFAB1E
       mov      edx, 204
       call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       ; byrRegs -[rsi]
       ; gcr arg pop 0
-       lea      rax, bword ptr [rbp-60H]
+       lea      rax, bword ptr [rbp-B0H]
       ; byrRegs +[rax]
-       vmovss   xmm0, dword ptr [rbp-58H]
-       vmovss   xmm1, dword ptr [rbp-54H]
-       vmovss   xmm2, dword ptr [rbp-50H]
-       vmovss   xmm3, dword ptr [rbp-4CH]
-       vmovss   xmm4, dword ptr [rbp-48H]
-       vmovss   xmm5, dword ptr [rbp-44H]
-       vmovss   dword ptr [rsp+20H], xmm4
-       vmovss   dword ptr [rsp+28H], xmm5
+       vmovaps  xmm0, xmm6
+       vmovaps  xmm1, xmm7
+       vmovaps  xmm2, xmm8
+       vmovaps  xmm3, xmm9
+       vmovss   dword ptr [rsp+20H], xmm10
+       vmovss   dword ptr [rsp+28H], xmm11

Part of this appears to be "non-ideal" LSRA decisions. Given that xmm6-9 is only used once and then later loaded into xmm0-3, it likely should've been loaded into xmm0-3 in the first place. I think this is partially "blocked" because of the call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE, but I haven't dug into the dump to confirm.

There's some other considerations here in that the managed code could be tweaked slightly to better handle this scenario.

tannergooding · 2023-02-04T18:47:06Z

Let's also take a look at:

private ValueTask<int> ReadBufferedAsync(Memory<byte> destination)
{
    // If the caller provided buffer, and thus the amount of data desired to be read,
    // is larger than the internal buffer, there's no point going through the internal
    // buffer, so just do an unbuffered read.
    // Also avoid avoid using the internal buffer if the user requested a zero-byte read to allow
    // underlying streams to efficiently handle such a read (e.g. SslStream defering buffer allocation).
    return destination.Length >= _readBuffer.Capacity || destination.Length == 0 ?
        ReadAsync(destination) :
        ReadBufferedAsyncCore(destination);
}

private async ValueTask<int> ReadAsync(Memory<byte> destination)
{
    // This is called when reading the response body.

    if (_readBuffer.ActiveLength > 0)
    {
        // We have data in the read buffer.  Return it to the caller.
        return ReadFromBuffer(destination.Span);
    }

    // No data in read buffer.
    // Do an unbuffered read directly against the underlying stream.
    Debug.Assert(_readAheadTask == default, "Read ahead task should have been consumed as part of the headers.");
    int count = await _stream.ReadAsync(destination).ConfigureAwait(false);
    if (NetEventSource.Log.IsEnabled()) Trace($"Received {count} bytes.");
    return count;
}

private async ValueTask<int> ReadBufferedAsyncCore(Memory<byte> destination)
{
    // This is called when reading the response body.

    if (_readBuffer.ActiveLength == 0)
    {
        // Do a buffered read directly against the underlying stream.
        Debug.Assert(_readAheadTask == default, "Read ahead task should have been consumed as part of the headers.");

        Debug.Assert(_readBuffer.AvailableLength == _readBuffer.Capacity);
        int bytesRead = await _stream.ReadAsync(_readBuffer.AvailableMemory).ConfigureAwait(false);
        _readBuffer.Commit(bytesRead);

        if (NetEventSource.Log.IsEnabled()) Trace($"Received {bytesRead} bytes.");
    }

    // Hand back as much data as we can fit.
    return ReadFromBuffer(destination.Span);
}

For this case we change from a CSE to a few new "Argument with side effect" instead.

Before:

;  V00 this         [V00,T01] (  5,  4   )     ref  ->  rcx         this class-hnd single-def
;  V01 RetBuf       [V01,T02] (  4,  3   )   byref  ->  rdx         single-def
;  V02 arg1         [V02,T00] (  5,  8   )   byref  ->   r8         ld-addr-op single-def
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;* V04 tmp1         [V04    ] (  0,  0   )     int  ->  zero-ref    "impAppendStmt"
;* V05 tmp2         [V05    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
;* V06 tmp3         [V06    ] (  0,  0   )     ref  ->  zero-ref    V09._object(offs=0x00) P-INDEP "field V02._object (fldOffset=0x0)"
;* V07 tmp4         [V07    ] (  0,  0   )     int  ->  zero-ref    V09._index(offs=0x08) P-INDEP "field V02._index (fldOffset=0x8)"
;* V08 tmp5         [V08    ] (  0,  0   )     int  ->  zero-ref    V09._length(offs=0x0c) P-INDEP "field V02._length (fldOffset=0xc)"
;* V09 tmp6         [V09    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V10 cse0         [V10,T03] (  3,  2.50)     int  ->  rax         "CSE - aggressive"

After:

;  V00 this         [V00,T02] (  5,  4   )     ref  ->  rcx         this class-hnd single-def
;  V01 RetBuf       [V01,T01] (  6,  4   )   byref  ->  rsi         single-def
;  V02 arg1         [V02,T00] (  5, 10   )   byref  ->   r8         ld-addr-op single-def
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V04 tmp1         [V04    ] (  0,  0   )     int  ->  zero-ref    "impAppendStmt"
;* V05 tmp2         [V05    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
;  V06 tmp3         [V06,T04] (  3,  2   )     ref  ->  rdi         single-def V09._object(offs=0x00) P-INDEP "field V02._object (fldOffset=0x0)"
;  V07 tmp4         [V07,T05] (  3,  2   )     int  ->  rbx         V09._index(offs=0x08) P-INDEP "field V02._index (fldOffset=0x8)"
;  V08 tmp5         [V08,T03] (  5,  3.50)     int  ->  rbp         V09._length(offs=0x0c) P-INDEP "field V02._length (fldOffset=0xc)"
;* V09 tmp6         [V09    ] (  0,  0   )  struct (16) zero-ref    "Promoted implicit byref"
;  V10 tmp7         [V10    ] (  8,  8   )  struct (16) [rsp+28H]   do-not-enreg[XSF] must-init addr-exposed "by-value struct argument"
;* V11 tmp8         [V11,T06] (  0,  0   )     ref  ->  zero-ref    single-def "argument with side effect"
;* V12 tmp9         [V12,T07] (  0,  0   )   byref  ->  zero-ref    single-def "argument with side effect"
;* V13 tmp10        [V13,T08] (  0,  0   )     ref  ->  zero-ref    single-def "argument with side effect"
;* V14 tmp11        [V14,T09] (  0,  0   )   byref  ->  zero-ref    single-def "argument with side effect"

This ends up happening because rather than two tail.jmp calls:

G_M51019_IG04:        ; bbWeight=0.50, epilog, nogc, extend
       tail.jmp [System.Net.Http.HttpConnection:ReadBufferedAsyncCore(System.Memory`1[ubyte]):System.Threading.Tasks.ValueTask`1[int]:this]
       ; gcr arg pop 0
						;; size=6 bbWeight=0.50 PerfScore 1.00
G_M51019_IG05:        ; bbWeight=0.50, gcrefRegs=00000002 {rcx}, byrefRegs=00000104 {rdx r8}, byref, epilog, nogc
       tail.jmp [System.Net.Http.HttpConnection:ReadAsync(System.Memory`1[ubyte]):System.Threading.Tasks.ValueTask`1[int]:this]

We end up with real calls and corresponding logic:

call     [System.Net.Http.HttpConnection:ReadBufferedAsyncCore(System.Memory`1[ubyte]):System.Threading.Tasks.ValueTask`1[int]:this]

tannergooding · 2023-02-04T18:50:49Z

Most regressions I've seen look to be along these lines, where the copy ended up being "beneficial" due to the ability for it to be a SIMD copy and use less registers than the alternative.

They generally look to be cases where small tweaks might be able to win this back.

jakobbotsch · 2023-02-04T18:54:02Z

Some credit where credit is due: the ability to make this change comes down to months of work from @SingleAccretion to teach the rest of the JIT about struct typed locals.

tannergooding · 2023-02-04T19:10:32Z

I left a note in the OP giving the well deserved credit to Single, and also my view on taking this change:

Given the breadth of the impact here, that this is a pure improvement to minopts (no diff + significant TP improvement), and that for fullopts the improvement bytes is over 4x that of the regression bytes, I think this is worth taking and getting in depth perf numbers as part of perf triage which let us better understand where to focus efforts around handling said regressions (or inversely, if we should scope the full-opts scenario down).

If we were to take this change today/tomorrow, then we would be able to get concrete perf numbers Tuesday (x64) and Thursday (Arm64) of next week

Given this is an 11 line change and we aren't snapping again in the next two weeks, the "worst case outcome" is that we decide to "revert" the change for fullopts after getting the initial numbers (given there is no diff for minopts and a nice TP improvement, there is no need to do also revert it there).

tannergooding · 2023-02-04T19:12:44Z

In the meantime, I am going to work on looking more in depth at the JIT dumps for the regression cases to see if there is anything simple/obvious around them (such as the tail call scenario).

tannergooding · 2023-02-05T17:43:39Z

Here's the base vs diff JitDump for Number.TryParseUInt64 where it drops the tail call optimization

Base.txt
Diff.txt

The first part of the diff is straightforward, just dropping OBJ(LCL_VAR_ADDR) in favor of LCL_VAR

This in turns when we get to morph and the LocalAddressVisitor, we no long increment the weighted ref count (base gets an ultimate weighted ref count of 3):

LocalAddressVisitor visiting statement:
STMT00004 ( 0x006[E-] ... 0x00F )
-               [000034] --C-G------                         *  RETURN    int   
-               [000031] --C-G------                         \--*  CALL      int    Test:TryParseUInt64IntegerStyle(System.ReadOnlySpan`1[ushort],int,System.Globalization.NumberFormatInfo,byref):int
-               [000033] n---------- arg0                       +--*  OBJ       struct<System.ReadOnlySpan`1, 16>
-               [000032] -----------                            |  \--*  LCL_VAR_ADDR byref  V00 arg0         
-                                                               |  \--*    byref  V00.System.ReadOnlySpan`1[ushort]:_reference (offs=0x00) -> V05 tmp1         
-                                                               |  \--*    int    V00.System.ReadOnlySpan`1[ushort]:_length (offs=0x08) -> V06 tmp2         
-               [000028] ----------- arg1                       +--*  LCL_VAR   int    V01 arg1         
-               [000029] ----------- arg2                       +--*  LCL_VAR   ref    V02 arg2         
-               [000030] ----------- arg3                       \--*  LCL_VAR   byref  V03 arg3         
-LocalAddressVisitor incrementing weighted ref count from 0 to 1 for implicit by-ref V00 arg passed to call
-LocalAddressVisitor modified statement:

Diff instead gets more done in in Morph - Byrefs with a new block covering a struct copy being inserted. That in turns results in Morph - Global making some additional transformations and then a rejection due to the callee having a byref parameter.

SingleAccretion · 2023-02-05T17:52:35Z

UpdateEarlyRefCount needs to recognize LCL_VARs.

tannergooding · 2023-02-05T19:49:18Z

src/coreclr/jit/lclmorph.cpp

+        // is introduced (it was primarily from impNormStructVal). But until all cases are gone
+        // we still want to handle it as well.
+
+        if (m_ancestors.Height() < 2)


UpdateEarlyRefCount needs to recognize LCL_VARs.

Updated that here. I left in the OBJ(LCL_VAR_ADDR) handling since we still have that in some cases (even if the majority of OBJ wrapping is gone now that it's out of impNormStrutVal.

Also changed it from a for loop to to some explicit linear checks instead, which simplifies things and should improve the method TP.

-- Not the most familiar with LclMorph, so I went with the assumption that the early exit for !m_compiler->lvaIsImplicitByRefLocal(lclNum) is enough to ensure LCL_VAR is something we want to handle and that we don't also need to explicitly check that LCL_VAR is a struct or a struct that would actually be passed by ref (e.g. struct S { int x; } typically gets passed in register instead).

Let me know if this assumption isn't correct and if more checks are needed.

Also did not handle LCL_FLD since this is "pre-order" and I believe that means we should only see GT_FIELD instead at this point in time.

tannergooding · 2023-02-05T22:26:25Z

New diffs

The TL;D:R is more improvements, less regressions; and even better TP

Windows x64

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.windows.x64.checked.mch	428	198	80	150	-5,718	+1,770
coreclr_tests.run.windows.x64.checked.mch	1,572	815	255	502	-56,614	+8,334
libraries.crossgen2.windows.x64.checked.mch	3,168	2,503	113	552	-65,778	+875
libraries.pmi.windows.x64.checked.mch	5,971	4,235	539	1,197	-139,180	+11,405
libraries_tests.pmi.windows.x64.checked.mch	8,308	5,678	1,361	1,269	-238,215	+69,148
	19,447	13,429	2,348	3,670	-505,505	+91,532

Windows Arm64

Collection	Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)
benchmarks.run.windows.arm64.checked.mch	404	135	70	199	-2,532	+1,984
coreclr_tests.run.windows.arm64.checked.mch	2,031	708	929	394	-47,340	+11,588
libraries.crossgen2.windows.arm64.checked.mch	1,923	897	101	925	-23,200	+1,508
libraries.pmi.windows.arm64.checked.mch	4,012	1,880	527	1,605	-43,824	+12,456
libraries_tests.pmi.windows.arm64.checked.mch	7,603	4,298	1,529	1,776	-117,256	+63,980
	15,973	7,918	3,156	4,899	-234,152	+91,516

jakobbotsch

LGTM. Nice diffs!

tannergooding · 2023-02-05T22:36:46Z

/azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm, runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

azure-pipelines · 2023-02-05T22:37:28Z

Azure Pipelines successfully started running 6 pipeline(s).

tannergooding · 2023-02-06T00:45:11Z

libraries-jitstress failures are #81585
outerloop, jistress, and isas-x86 failures are #81669
Fuzzlyn failure is pre-exting: #81675

tannergooding · 2023-02-06T04:11:24Z

Merging given this is a small and easily reverted change, that it has very large positive impact and a small set of regressions.

This will allow us to get concrete perf numbers as part of the perf triage on Tuesday (x64) and Thursday (Arm64) that will let us know with more confidence the impact of the diffs.

ghost assigned tannergooding Feb 4, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 4, 2023

Adjust impNormStructVal to not wrap in GT_OBJ

d55a004

tannergooding force-pushed the impNormStructVal branch from 71481de to d55a004 Compare February 4, 2023 14:41

tannergooding marked this pull request as ready for review February 4, 2023 18:51

Ensure UpdateEarlyRefCount handles CALL(LCL_VAR)

cd579dc

tannergooding commented Feb 5, 2023

View reviewed changes

Merge remote-tracking branch 'dotnet/main' into impNormStructVal

bc306b1

jakobbotsch approved these changes Feb 5, 2023

View reviewed changes

tannergooding merged commit 58719ec into dotnet:main Feb 6, 2023

tannergooding deleted the impNormStructVal branch February 6, 2023 04:11

tannergooding mentioned this pull request Feb 9, 2023

Remove the OBJ(ADDR) wrapping created for calls by the importer #51569

Closed

ghost locked as resolved and limited conversation to collaborators Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust impNormStructVal to not wrap in GT_OBJ #81636

Adjust impNormStructVal to not wrap in GT_OBJ #81636

tannergooding commented Feb 4, 2023 •

edited

Loading

ghost commented Feb 4, 2023

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023

tannergooding commented Feb 4, 2023

tannergooding commented Feb 4, 2023

jakobbotsch commented Feb 4, 2023

tannergooding commented Feb 4, 2023

tannergooding commented Feb 4, 2023

tannergooding commented Feb 5, 2023

SingleAccretion commented Feb 5, 2023

tannergooding Feb 5, 2023 •

edited

Loading

tannergooding Feb 5, 2023

tannergooding Feb 5, 2023

tannergooding commented Feb 5, 2023 •

edited

Loading

jakobbotsch left a comment

tannergooding commented Feb 5, 2023

azure-pipelines bot commented Feb 5, 2023

tannergooding commented Feb 6, 2023 •

edited

Loading

tannergooding commented Feb 6, 2023

Adjust impNormStructVal to not wrap in GT_OBJ #81636

Adjust impNormStructVal to not wrap in GT_OBJ #81636

Conversation

tannergooding commented Feb 4, 2023 • edited Loading

ghost commented Feb 4, 2023

tannergooding commented Feb 4, 2023 • edited Loading

Windows x64

Windows Arm64

tannergooding commented Feb 4, 2023 • edited Loading

tannergooding commented Feb 4, 2023 • edited Loading

tannergooding commented Feb 4, 2023

tannergooding commented Feb 4, 2023

Before:

After:

tannergooding commented Feb 4, 2023

jakobbotsch commented Feb 4, 2023

tannergooding commented Feb 4, 2023

tannergooding commented Feb 4, 2023

tannergooding commented Feb 5, 2023

SingleAccretion commented Feb 5, 2023

tannergooding Feb 5, 2023 • edited Loading

Choose a reason for hiding this comment

tannergooding Feb 5, 2023

Choose a reason for hiding this comment

tannergooding Feb 5, 2023

Choose a reason for hiding this comment

tannergooding commented Feb 5, 2023 • edited Loading

Windows x64

Windows Arm64

jakobbotsch left a comment

Choose a reason for hiding this comment

tannergooding commented Feb 5, 2023

azure-pipelines bot commented Feb 5, 2023

tannergooding commented Feb 6, 2023 • edited Loading

tannergooding commented Feb 6, 2023

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding commented Feb 4, 2023 •

edited

Loading

tannergooding Feb 5, 2023 •

edited

Loading

tannergooding commented Feb 5, 2023 •

edited

Loading

tannergooding commented Feb 6, 2023 •

edited

Loading