Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit armv8.4 ldapur* for volatile loads with contained offsets #89681

Merged
merged 22 commits into from
Aug 31, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 29, 2023

Contributes to #64457

volatile int a;
volatile int b;
volatile int c;
volatile int d;

[Benchmark]
public int Test()
{
    return a + b + c + d;
}

Current codegen for Test:

; Assembly listing for method P:Test():int:this (Tier1)		
            stp     fp, lr, [sp, #-0x10]!		
            mov     fp, sp		
            add     x1, x0, #8		
            ldapr   w1, [x1]		
            add     x2, x0, #12		
            ldapr   w2, [x2]		
            add     w1, w1, w2		
            add     x2, x0, #16		
            ldapr   w2, [x2]		
            add     w1, w1, w2		
            add     x0, x0, #20		
            ldapr   w0, [x0]		
            add     w0, w1, w0		
            ldp     fp, lr, [sp], #0x10		
            ret     lr		
; Total bytes of code 60

New codegen for Test:

; Assembly listing for method P:Test():int:this (Tier1)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            ldapur  w1, [x0, #0x08]
            ldapur  w2, [x0, #0x0C]
            add     w1, w1, w2
            ldapur  w2, [x0, #0x10]
            add     w1, w1, w2
            ldapur  w0, [x0, #0x14]
            add     w0, w1, w0
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code 44

Apple M2 Max:

Method Job Toolchain Mean Error StdDev Ratio
Test Job-LOXHRO /Core_Root_PR/corerun 0.4724 ns 0.0050 ns 0.0045 ns 0.89
Test Job-MPJUHL /Core_Root_main/corerun 0.5327 ns 0.0063 ns 0.0059 ns 1.00

Perf difference is not super exciting but it is what it is 🙂

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 29, 2023
@ghost ghost assigned EgorBo Jul 29, 2023
@ghost
Copy link

ghost commented Jul 29, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
volatile int a;
volatile int b;
volatile int c;
volatile int d;

[Benchmark]
public int Test()
{
    return a + b + c + d;
}

Current codegen for Test:

; Assembly listing for method P:Test():int:this (Tier1)		
            stp     fp, lr, [sp, #-0x10]!		
            mov     fp, sp		
            add     x1, x0, #8		
            ldapr   w1, [x1]		
            add     x2, x0, #12		
            ldapr   w2, [x2]		
            add     w1, w1, w2		
            add     x2, x0, #16		
            ldapr   w2, [x2]		
            add     w1, w1, w2		
            add     x0, x0, #20		
            ldapr   w0, [x0]		
            add     w0, w1, w0		
            ldp     fp, lr, [sp], #0x10		
            ret     lr		
; Total bytes of code 60

New codegen for Test:

; Assembly listing for method P:Test():int:this (Tier1)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            ldapur  w1, [x0, #0x08]
            ldapur  w2, [x0, #0x0C]
            add     w1, w1, w2
            ldapur  w2, [x0, #0x10]
            add     w1, w1, w2
            ldapur  w0, [x0, #0x14]
            add     w0, w1, w0
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code 44
Method Job Toolchain Mean Error StdDev Ratio
Test Job-LOXHRO /Core_Root_PR/corerun 0.4724 ns 0.0050 ns 0.0045 ns 0.89
Test Job-MPJUHL /Core_Root_main/corerun 0.5327 ns 0.0063 ns 0.0059 ns 1.00

Perf difference is not super exciting but it is what it is 🙂

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

static_assert_no_msg(INS_count <= 512);
instruction _idIns : 9;
static_assert_no_msg(INS_count <= 1024);
instruction _idIns : 10;
Copy link
Member Author

@EgorBo EgorBo Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit here since we now have more than 512 instructions (and that for sure will be increasing in future) but I also removed an used (on arm) _idCallRegPtr bit so the total size of the struct remains the same - 16b

@EgorBo EgorBo marked this pull request as ready for review July 30, 2023 17:46
@EgorBo EgorBo added this to the 9.0.0 milestone Aug 5, 2023
@EgorBo EgorBo marked this pull request as draft August 21, 2023 11:44
@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2023

@TIHan @dotnet/jit-contrib PTAL, simple change, emits ldapur where we previously emitted ldapr+add because ldapr doesn't support IMM offsets

@EgorBo EgorBo requested a review from TIHan August 27, 2023 12:55
@@ -8813,8 +8837,6 @@ void emitter::emitIns_Call(EmitCallType callType,
{
/* This is an indirect call (either a virtual call or func ptr call) */

id->idSetIsCallRegPtr();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed for arm32?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it in https://github.com/dotnet/runtime/pull/89681/files#r1278410873 comment, basically, it sets a bit that is unused for arm, so I removed that bit because I needed to extend _idIns's width

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one comment on the idSetIsCallingRegPtr.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants