-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Unify arm64 and x64 GT_SELECT handling #82610
Conversation
This unifies GT_SELECT/GT_SELECTCC handling between arm64 and x64. The arm64 backend no longer uses containment for compare chains; instead, there is a new GT_CCMP node that both produces and consumes flags, and lowering can lower GT_AND(op, relop) down to this node.
{ | ||
child->AsOp()->SetContained(); | ||
GenCondition cond2 = GenCondition::FromRelop(op2); | ||
op2->SetOper(GT_CCMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know this is done elsewhere), but why is this safe? op2 is moving from a GenTreeOp to a GenTreeCCMP. The latter class is bigger than the former. What's stopping memory overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT has only two different node sizes: small nodes (of size TREE_NODE_SZ_SMALL
) and large nodes (of size TREE_NODE_SZ_LARGE
). Whenever a node is allocated, its size is rounded up to one of these. GenTree::InitNodeSize
has various static asserts for the expected node sizes.
All the nodes here are small, so bashing between them is "safe" as far as size issues goes, though this ignores that this is not conformant C++.
Historically (before I worked on the JIT) the IR node hierarchy was not represented via inheritance, but was represented via unions instead, which made this pattern of bashing between node types less controversial. But as you've noted we still do it in various places due to the nice perf characteristics of changing nodes in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree::SetOper
also has asserts that the bashing done is safe as far as the size issue goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, hoped there was something like that.
Looking at the static assert checks for TREE_NODE_SZ_SMALL / TREE_NODE_SZ_LARGE, there's nothing there for GenTreeConditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one for GenTreeConditional
with the latest commits
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
Code looks fine to me. Are there any code diffs for this PR? |
Yes, a few improvements: https://dev.azure.com/dnceng-public/public/_build/results?buildId=184319&view=ms.vss-build-web.run-extensions-tab The ones I looked at are new cases where we use @@ -22,9 +22,6 @@ N012 ( 30, 12) [000003] DA-XG------ ▌ STORE_LCL_VAR r
[000037] ----------- IL_OFFSET void INLRT @ 0x007[E-]
N001 ( 1, 1) [000004] ----------- t4 = LCL_VAR ref V01 loc0 u:2 $82
N002 ( 1, 2) [000005] -c--------- t5 = CNS_INT ref null $VN.Null
- ┌──▌ t4 ref
- ├──▌ t5 ref
-N003 ( 6, 4) [000006] ----------- t6 = ▌ NE int $240
N004 ( 1, 1) [000007] ----------- t7 = LCL_VAR ref V01 loc0 u:2 (last use) $82
┌──▌ t7 ref
N006 ( 3, 4) [000035] -c--------- t35 = ▌ LEA(b+96) byref
@@ -35,11 +32,12 @@ N009 ( 6, 6) [000033] -c--------- t33 = ▌ LEA(b+104) byre
┌──▌ t33 byref
N010 ( 7, 5) [000019] ---XG------ t19 = ▌ IND int <l:$244, c:$245>
N011 ( 1, 2) [000012] -c--------- t12 = CNS_INT int 4 $44
+ ┌──▌ t4 ref
+ ├──▌ t5 ref
+N003 ( 6, 4) [000006] ----------- ▌ CMP void
┌──▌ t19 int
├──▌ t12 int
-N012 ( 12, 8) [000013] ---XG------ t13 = ▌ EQ int <l:$249, c:$248>
- ┌──▌ t6 int
- ├──▌ t13 int
-N013 ( 19, 13) [000014] ---XG------ t14 = ▌ AND int <l:$24d, c:$24c>
+N012 ( 12, 8) [000013] ---XG------ ▌ CCMP void cond=UNE flags=0
+N013 ( 19, 13) [000014] ---XG------ t14 = SETCC int cond=UEQ
┌──▌ t14 int
N014 ( 20, 14) [000015] ---XG------ ▌ RETURN int <l:$14a, c:$149> Notice that I also just realized that the FWIW, with this change you will not see [MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int a, int b, int c)
{
if (a < b & b < c & c < 10)
return 42;
return 13;
} Codegen: cmp w0, w1
ccmp w1, w2, z, lt
ccmp w2, #10, z, lt
cset x0, lt
mov w1, #13
mov w2, #42
cmp w0, #0
csel w0, w1, w2, eq Expected codegen is something like: cmp w0, w1
ccmp w1, w2, z, lt
ccmp w2, #10, z, lt
csel w0, w1, w2, lt I think the best fix would be to teach |
Agreed this makes sense to do. Is that something your planning to do? because if so I can hold off again from rebasing #79283. |
Yep, I can do that in a follow-up. This PR should be ready, cc @dotnet/jit-contrib PTAL @kunalspathak @BruceForstall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup. Added some questions/comments.
// Returns: | ||
// A flags immediate that, if those flags were set, would cause the specified condition to be true. | ||
// | ||
insCflags Lowering::TruthifyingFlags(GenCondition condition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we repurpose InsCflagsForCcmp
? Move it to a GenTree, make it static and use it from codegen as well as lower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need that function in codegen anymore -- this PR is removing it from codegenarm64.cpp (it is effectively the same function, except without reversing the condition). In a follow-up I will add the ability to generate ccmp for OR(relop, relop)
too which will use TruthifyingFlags
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for some reason I didn't noticed that you have removed InsCflagsForCcmp()
.
src/coreclr/jit/lowerarmarch.cpp
Outdated
} | ||
else if (child->OperIsCmpCompare() && varTypeIsIntegral(child->gtGetOp1()) && varTypeIsIntegral(child->gtGetOp2())) | ||
|
||
if (!op2->OperIsCmpCompare() || !varTypeIsIntegral(op2->gtGetOp1())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why this is also not !op2->operIsCompare()
, just like you have !op1->OperIsCompare()
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operands of op2
will be compared with the ccmp
instruction, and we can only do that for the "basic" relops: GT_EQ
, GT_NE
, GT_LE
, GT_LT
, GT_GT
, GT_GE
. The remaining compare operators, GT_TEST_EQ
and GT_TEST_NE
(whose semantics are (a & b) == 0
and (a & b) != 0
) cannot be implemented as a conditional compare -- it would need some kind of ctst
instruction.
On the other hand, the first operand can be anything that has already set the condition flags in any way, including e.g. floating point compares or tst
instruction (or, a previous conditional compare, which is how we end up with the chains).
Let me add a more detailed comment here about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a more detailed comment. I also experimented with making this logic symmetric (allow ccmp
for either op1
or op2
), but diffs showed that it didn't matter. Might still do it as part of the OR
follow up, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This unifies GT_SELECT/GT_SELECTCC handling between arm64 and x64. The arm64 backend no longer uses containment for compare chains; instead, there is a new GT_CCMP node that both produces and consumes flags, and lowering can lower GT_AND(op, relop) down to this node.
To do this, the JIT verifies invariance and reorders the
cmp/ccmps
to happen successively in order. For example, given C# code like:we see LIR:
When lowering the first
AND
([000006]
), we will notice that its operands are compares, and lower this as:I.e. we moved the compare
[000002]
forward and turned it into aCMP
node, and[000005]
was then turned into aCCMP
node.When we see the second
AND
node ([000010]
) we do the same thing again: verify invariance and move the entire compare chain forward:When we finally get to the
SELECT
we do the same thing -- move all the conditional compares in front of theSELECT
and then change it toSELECTCC
.cc @a74nh