JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

hez2010 · 2024-07-15T17:09:51Z

Positive case:

var chs = new char[42];
chs[1] = 'a';
Console.WriteLine((int)chs[1] + chs.Length);

Codegen:

; Assembly listing for method ArrayAllocator.Program:Main() (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact <short[]>
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (104) zero-ref    do-not-enreg[SF] "stack allocated array temp"
;* V03 tmp2         [V03    ] (  0,  0   )    long  ->  zero-ref    single-def "V02.[000..008)"
;* V04 tmp3         [V04    ] (  0,  0   )     int  ->  zero-ref    single-def "V02.[008..012)"
;* V05 tmp4         [V05    ] (  0,  0   )   short  ->  zero-ref    "V02.[018..020)"
;
; Lcl frame size = 40

G_M25548_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M25548_IG02:  ;; offset=0x0004
       mov      ecx, 84
       call     [System.Console:WriteLine(int)]
       nop
                                                ;; size=12 bbWeight=1 PerfScore 3.50
G_M25548_IG03:  ;; offset=0x0010
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 21, prolog size 4, PerfScore 5.00, instruction count 6, allocated bytes for code 21 (MethodHash=5b0b9c33) for method ArrayAllocator.Program:Main() (FullOpts)

Negative case:

var chs = new char[42];
chs[1] = 'a';
Console.WriteLine((int)chs[42] + chs.Length);

Codegen:

; Assembly listing for method ArrayAllocator.Program:Main() (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact <short[]>
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (104) zero-ref    do-not-enreg[SF] "stack allocated array temp"
;  V03 tmp2         [V03,T00] (  1,  0   )   byref  ->  rbx         must-init "dummy temp of must thrown exception"
;* V04 tmp3         [V04    ] (  0,  0   )    long  ->  zero-ref    single-def "V02.[000..008)"
;* V05 tmp4         [V05    ] (  0,  0   )     int  ->  zero-ref    single-def "V02.[008..012)"
;* V06 tmp5         [V06    ] (  0,  0   )   short  ->  zero-ref    single-def "V02.[018..020)"
;
; Lcl frame size = 32

G_M25548_IG01:  ;; offset=0x0000
       push     rbx
       sub      rsp, 32
       xor      ebx, ebx
                                                ;; size=7 bbWeight=0 PerfScore 0.00
G_M25548_IG02:  ;; offset=0x0007
       call     CORINFO_HELP_RNGCHKFAIL
       movsx    rcx, word  ptr [rbx]
       call     [System.Console:WriteLine(int)]
       int3
                                                ;; size=16 bbWeight=0 PerfScore 0.00

; Total bytes of code 23, prolog size 5, PerfScore 0.00, instruction count 7, allocated bytes for code 23 (MethodHash=5b0b9c33) for method ArrayAllocator.Program:Main() (FullOpts)
; ============================================================

Benchmark on Mandelbrot:

Method	Job	Mean	Error	StdDev	Code Size	Allocated
MandelBrot	NoStackAllocationArray	199.7 us	1.30 us	1.22 us	1,996 B	2.49 KB
MandelBrot	StackAllocationArray	195.8 us	1.16 us	1.08 us	2,414 B	1.14 KB

Diff: https://www.diffchecker.com/bNP4qHdF/

dotnet-policy-service · 2024-07-15T17:10:25Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/objectalloc.h

src/coreclr/jit/objectalloc.cpp

AndyAyersMS

For arrays (and also perhaps boxes and ref classes) we ought to have some kind of size limit... possibly similar to the one we use for stackallocs.

We need to be careful we don't allocate a lot of stack for an object that might not be heavily used, as we'll pay per-call prolog zeroing costs.

src/coreclr/jit/lclmorph.cpp

AndyAyersMS

I don't think we need to handle the align8 arrays.

But if you do, if you're on a 32 bit platform you probably need to mark the stack allocation local with lvStructDoubleAlign. And run some tests to make sure the right things actually happen.

AndyAyersMS · 2024-07-19T18:30:11Z

Thank you for working on this. I don't see us taking this for .NET 9 but perhaps we can soon thereafter. I will take a closer look next week sometime.

The stats above are not all that encouraging. I think for both arrays and ref classes we likely need to take a more holistic look at what is blocking us from finding more cases (or if indeed, if there are not more cases to be had). Eugene did this sort of thing when the initial support was implemented and found similarly less-than-encouraging results (which is why we never enabled it), but it is worth taking a fresh look.

Did you put up a PR for the field-wise analysis anywhere? That may be one of the key missing pieces.

hez2010 · 2024-07-19T18:33:19Z

Did you put up a PR for the field-wise analysis anywhere? That may be one of the key missing pieces.

It's on my branch. Will open a draft PR soon to validate CI and see how much benefits we can gain from the field-wise analysis.

hez2010 · 2024-07-20T15:37:22Z

Tests (modulus unrelated) are passing.
PTAL @jakobbotsch

name	arrays (%)	arrays with field-wise analysis (%)
benchmarks.run.windows.x64.checked	14/4163 (0.34%)	14/4084 (0.34%)
benchmarks.run_pgo.windows.x64.checked	25/1562 (1.60%)	36/1550 (2.32%)
benchmarks.run_tiered.windows.x64.checked	14/378 (3.70%)	10/368 (2.72%)
coreclr_tests.run.windows.x64.checked	162/60525 (0.27%)	183/60271 (0.30%)
libraries.crossgen2.windows.x64.checked	13/11330 (0.11%)	48/11300 (0.42%)
libraries.pmi.windows.x64.checked	16/14840 (0.11%)	325/14577 (2.23%)
libraries_tests.run.windows.x64.Release	79/16584 (0.48%)	124/16167 (0.77%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release	325/109551 (0.30%)	1340/108652 (1.23%)
realworld.run.windows.x64.checked	1/3438 (0.03%)	3/3331 (0.09%)
smoke_tests.nativeaot.windows.x64.checked	0/857 (0.00%)	2/855 (0.23%)

with field-wise analysis we are missing context for more methods, so it's possible that spmi reports worse stats.

hez2010 · 2024-07-20T16:19:01Z

I collected the method context against BCL by myself to prevent missing method context.

name	ref-classes (%)	boxes (%)	arrays (%)
w/o field-wise	177/49224 (0.36%)	380/7015 (5.42%)	19/5195 (0.37%)
with field-wise	407/55667 (0.73%)	375/7031 (5.33%)	38/5823 (0.65%)

hez2010 · 2024-07-22T05:31:38Z

I don't see us taking this for .NET 9 but perhaps we can soon thereafter. I will take a closer look next week sometime.

The stats above are not all that encouraging. I think for both arrays and ref classes we likely need to take a more holistic look at what is blocking us from finding more cases (or if indeed, if there are not more cases to be had). Eugene did this sort of thing when the initial support was implemented and found similarly less-than-encouraging results (which is why we never enabled it), but it is worth taking a fresh look.

It's actually a small change as it relies on the existing analysis and added the treatment to arrays as well, which is a "free" improvement. I personally would like to see it goes to .NET 9.

JulieLeeMSFT · 2024-08-12T17:22:31Z

This is optimization, not correctness issue. We don't have time to work on this for .NET 9, so we will review it in .NET 10.

JulieLeeMSFT · 2024-09-16T13:27:18Z

@AndyAyersMS please review this community PR.

AndyAyersMS

Some questions along with the particular notes:

do you see diffs with these changes when DOTNET_JitObjectStackAllocationArray=0?
are stack allocated arrays amenable to physical promotion? If not, why not?
what does codegen look like for a locally allocated array consumed by a loop? Is it at least as good as the heap case?
we need a tighter size limit, 8192 is too high. We likely need one both for individual allocations (especially if conditional) and for the overall amount of stack size increase. For individual allocations we should either match the stackalloc -> local size limit or gather data suggesting why a different choice is the right one. For the overall limit I'm less sure, but something around 8192 for all stack allocated locals seems about right.
given that, we might need a prioritization scheme, if we have so many allocation opportunities that some get left out. But I expect this will be rare.

src/coreclr/jit/gentree.h

src/coreclr/jit/lclmorph.cpp

AndyAyersMS · 2024-09-16T15:31:08Z

src/coreclr/jit/lclmorph.cpp

+                    }
+                    else
+                    {
+                        *use           = m_compiler->gtNewOperNode(GT_COMMA, node->TypeGet(),


Do we really need to handle the out of bounds cases here?

IIRC not handing it will make the negative case address exposed, while inserting a BOUND_CHECK node will lead to an assert in later phase.

AndyAyersMS · 2024-09-16T15:33:40Z

src/coreclr/jit/morph.cpp

@@ -8358,10 +8358,20 @@ GenTree* Compiler::fgMorphSmpOp(GenTree* tree, MorphAddrContext* mac, bool* optA
            return tree;
        }

+#ifdef DEBUG


As I noted over in #107542 we don't want to have control flow logic conditional on DEBUG like this... we need to find a better fix.

hez2010 · 2024-09-18T14:54:22Z

do you see diffs with these changes when DOTNET_JitObjectStackAllocationArray=0?

I will test it and post the result later.

are stack allocated arrays amenable to physical promotion? If not, why not?

I think yes?

int[] a = new[] { 1,2,3,4 };
return a[2];

will produce the following codegen:

mov rax, 3
ret

what does codegen look like for a locally allocated array consumed by a loop? Is it at least as good as the heap case?

We need #102965 to get better codegen for this case. Otherwise it gets address exposed.

we need a tighter size limit, 8192 is too high. We likely need one both for individual allocations (especially if conditional) and for the overall amount of stack size increase. For individual allocations we should either match the stackalloc -> local size limit or gather data suggesting why a different choice is the right one. For the overall limit I'm less sure, but something around 8192 for all stack allocated locals seems about right.

Agree. I think we can do this in a following PR?

hez2010 · 2024-09-20T13:49:46Z

do you see diffs with these changes when DOTNET_JitObjectStackAllocationArray=0?

Just checked superpmi and there's no stack allocated array when DOTNET_JitObjectStackAllocationArray=0.

hez2010 added 7 commits July 15, 2024 20:23

initial prototype

1b0e3d3

Morph ARR_LENGTH and INDEX_ADDR

57b7e42

Fix incorrect array length storage

1b5b25e

Use offset and correct type

395b735

handle reassignment

17de70b

range check

5443c42

throw range check failure

b2d07da

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 15, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 15, 2024

hez2010 added 2 commits July 16, 2024 02:13

update comments

b5ae9e7

add metrics

87b29de

jakobbotsch reviewed Jul 15, 2024

View reviewed changes

src/coreclr/jit/objectalloc.h Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 15, 2024

View reviewed changes

src/coreclr/jit/objectalloc.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 15, 2024

View reviewed changes

src/coreclr/jit/objectalloc.cpp Outdated Show resolved Hide resolved

minor cleanup

eeb681d

AndyAyersMS reviewed Jul 15, 2024

View reviewed changes

AndyAyersMS mentioned this pull request Jul 16, 2024

Stack Allocation Enhancements #104936

Open

19 tasks

hez2010 added 5 commits July 16, 2024 14:13

Introduce new temp and implement local address morphing

dee9f38

handle index out-of-range

94c103b

Refactor to remove duplicates

12b297b

Remove invalid asserts

e0fa91e

make compiler happy

9e0a04f

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lclmorph.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lclmorph.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lclmorph.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Jul 16, 2024

View reviewed changes

src/coreclr/jit/lclmorph.cpp Outdated Show resolved Hide resolved

cleanup

a01562e

AndyAyersMS reviewed Jul 19, 2024

View reviewed changes

remove invalid assert

e728d4f

hez2010 added 2 commits July 20, 2024 03:37

double align on 32bit platform

d73c5c5

Use correct alignment for align8

c9fea23

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

hez2010 added 2 commits July 20, 2024 19:25

Fix intrinsic expansion

772bee6

Merge branch 'main' into value-array-stack-alloc

499ac0b

This comment was marked as resolved.

Sign in to view

build-analysis bot mentioned this pull request Jul 20, 2024

linux-armel checked CoreCLR_NonPortable failing to build with "Unable to find toolchain executable. Name: 'ar', Prefix: 'llvm-'" #105176

Closed

Merge remote-tracking branch 'origin/main' into value-array-stack-alloc

ffcbd85

build-analysis bot mentioned this pull request Jul 22, 2024

Test failure: GC\\Features\\HeapExpansion\\Finalizer\\Finalizer.cmd #102706

Closed

JulieLeeMSFT added this to the 10.0.0 milestone Aug 12, 2024

JulieLeeMSFT assigned hez2010 Aug 12, 2024

Merge branch 'main' into value-array-stack-alloc

4789090

hez2010 mentioned this pull request Sep 13, 2024

JIT: adjust where folded comma throws are morphed #107777

Draft

JulieLeeMSFT requested a review from AndyAyersMS September 16, 2024 13:26

AndyAyersMS reviewed Sep 16, 2024

View reviewed changes

Merge branch 'main' into value-array-stack-alloc

1b33170

Address some review feedback

9c81c04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

hez2010 commented Jul 15, 2024 •

edited

Loading

dotnet-policy-service bot commented Jul 15, 2024

AndyAyersMS left a comment

AndyAyersMS left a comment

AndyAyersMS commented Jul 19, 2024

hez2010 commented Jul 19, 2024

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

hez2010 commented Jul 20, 2024 •

edited

Loading

hez2010 commented Jul 20, 2024

hez2010 commented Jul 22, 2024

JulieLeeMSFT commented Aug 12, 2024

JulieLeeMSFT commented Sep 16, 2024

AndyAyersMS left a comment

AndyAyersMS Sep 16, 2024

hez2010 Sep 18, 2024

AndyAyersMS Sep 16, 2024

hez2010 commented Sep 18, 2024

hez2010 commented Sep 20, 2024

JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

Are you sure you want to change the base?

JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

Conversation

hez2010 commented Jul 15, 2024 • edited Loading

dotnet-policy-service bot commented Jul 15, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Jul 19, 2024

hez2010 commented Jul 19, 2024

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as resolved.

hez2010 commented Jul 20, 2024 • edited Loading

hez2010 commented Jul 20, 2024

hez2010 commented Jul 22, 2024

JulieLeeMSFT commented Aug 12, 2024

JulieLeeMSFT commented Sep 16, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS Sep 16, 2024

Choose a reason for hiding this comment

hez2010 Sep 18, 2024

Choose a reason for hiding this comment

AndyAyersMS Sep 16, 2024

Choose a reason for hiding this comment

hez2010 commented Sep 18, 2024

hez2010 commented Sep 20, 2024

hez2010 commented Jul 15, 2024 •

edited

Loading

hez2010 commented Jul 20, 2024 •

edited

Loading