-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: Use zero register wherever possible #52269
Conversation
Windows Arm64 (libraries.pmi)
Detail diffs
Detail diffs
Windows Arm64 (benchmarks_run)
Detail diffs
Detail diffs
Linux Arm64 (libraries.pmi)
Detail diffs
Detail diffs
|
Here is some analysis from the regression: Earlier, the code "mov XXX, #0" had Overall, the code size improved with better CQ and we can tolerate the regression. |
Failures are unrelated. |
@dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kunalspathak Have you confirmed that strb
and strh
work properly with this change?
@echesakov - No, this doesn't address struct MyMemory1
{
byte x1;
byte x2;
private MyMemory1(byte x, byte y)
{
x1 = x;
x2 = y;
}
public MyMemory1 Slice()
{
return new MyMemory1(0, 0);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public MyMemory1 Test1()
{
var m = new MyMemory1().Slice();
Consume(m);
return m;
} The code we generate is: G_M31828_IG01: ;; offset=0000H
A9BE7BFD stp fp, lr, [sp,#-32]!
910003FD mov fp, sp
F9000FBF str xzr, [fp,#24] // [V01 loc0]
F9000BBF str xzr, [fp,#16] // [V02 loc1]
;; bbWeight=1 PerfScore 3.50
G_M31828_IG02: ;; offset=0010H
910043A0 add x0, fp, #16 // [V02 loc1]
7900001F strh wzr, [x0]
2A1F03E0 mov w0, wzr
390063A0 strb w0, [fp,#24] // [V06 tmp3]
390067A0 strb w0, [fp,#25] // [V07 tmp4]
910063A0 add x0, fp, #24 // [V01 loc0]
94000000 bl MiniBench.Slice:Consume(byref)
B9401BA0 ldr w0, [fp,#24] // [V01 loc0]
;; bbWeight=1 PerfScore 7.50
G_M31828_IG03: ;; offset=0030H
A8C27BFD ldp fp, lr, [sp],#32
D65F03C0 ret lr Related JitDumpDefList: { }
N015 ( 1, 2) [000040] -c---------- * CNS_INT int 0 REG NA $80
Contained
DefList: { }
N017 ( 1, 3) [000042] DA---------- * STORE_LCL_VAR ubyte V11 tmp8 d:2 NA REG NA
<RefPosition #5 @18 RefTypeDef <Ivl:1 V11> STORE_LCL_VAR BB01 regmask=[x0-xip0 x19-x28] minReg=1 last>
DefList: { }
N019 ( 2, 2) [000052] ------------ * LCL_VAR ubyte V11 tmp8 u:2 NA REG NA $80
DefList: { }
N021 ( 7, 6) [000053] DA--G------- * STORE_LCL_VAR ubyte (AX) V06 tmp3 NA REG NA
<RefPosition #6 @21 RefTypeUse <Ivl:1 V11> LCL_VAR BB01 regmask=[x0-xip0 x19-x28] minReg=1 last>
DefList: { }
N023 ( 2, 2) [000055] ------------ * LCL_VAR ubyte V11 tmp8 u:2 NA (last use) REG NA $80
DefList: { }
N025 ( 7, 6) [000056] DA--G------- * STORE_LCL_VAR ubyte (AX) V07 tmp4 NA REG NA
<RefPosition #7 @25 RefTypeUse <Ivl:1 V11> LCL_VAR BB01 regmask=[x0-xip0 x19-x28] minReg=1 last> Register allocation has an ability to reuse the register having constant value, but in this case, we don't mark the |
@kunalspathak Okay, makes sense. Wonder if it's worth asserting |
That won't be the case always. The earlier code before my change was actually marking |
While investigating #41704 I noticed that we do not use zero register often, but instead move
#0
to a register and then use that register. This minor PR fixes that issue (thanks to @echesakov for cross checking the fix). For Arm64, we would just mark such node as contained and then, wherever that value is used, code generation already handles the cases for0
immediate by emittingREG_ZR
.