Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

MineCake147E · 2022-10-02T08:51:24Z

Description

RyuJIT doesn't seem to be interested in flags set by instructions other than cmp and test.

`add` instruction and flags

For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands.
In this code,

using System;
public static class C
{
    public static uint AddCarry(uint left, uint right, out bool carry){
        var res = left + right;
        carry = res < left;
        return res;
    }
}

Results in

; Core CLR 6.0.822.36306 on amd64

C.AddCarry(UInt32, UInt32, Boolean ByRef)
    L0000: lea eax, [rcx+rdx]   # We can use `add ecx, edx` instead, for utilizing flags set by it.
    L0003: cmp eax, ecx         # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`.
    L0005: setb dl
    L0008: mov [r8], dl
    L000b: ret

In this case, LLVM utilizes this fact, for example:

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t Add(uint32_t left, uint32_t right, bool* carry) {
    auto res = left + right;
    *carry = res < left;
    return res;
}

Add(unsigned int, unsigned int, bool*):                             # @Add(unsigned int, unsigned int, bool*)
        mov     eax, edi
        add     eax, esi        # CF is set here if there is a carry.
        setb    byte ptr [rdx]  # `setb` extracts CF here.
        ret

`sub` instruction with redundant `cmp` instruction

It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value.

using System;
public static class C
{
    public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){
        bool cf = false;
        var res = left - right;    //CF is set if left < right
        cf = res > left;           //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers.
        if (cf) res = left;
        carry = cf;
        return res;
    }
}

; Core CLR 6.0.822.36306 on amd64

C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef)
    L0000: mov eax, ecx
    L0002: sub eax, edx
    L0004: cmp eax, ecx
    L0006: seta dl
    L0009: movzx edx, dl    # Is this `movzx edx, dl` really necessary?
    L000c: test edx, edx    # `test dl, dl` is also valid here.
    L000e: je short L0012
    L0010: mov eax, ecx
    L0012: mov [r8], dl
    L0015: ret

The cmp instruction does the same comparison as the sub instruction, so the sub and cmp instructions with the same register operands change flags in the same way.
In this specific case, LLVM mysteriously transforms this calculation into something like res = left - (left < right ? 0 : right).

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    auto res = left;
    asm("sub %1, %0" : "+r"(res) : "r"(right));
    asm("cmovb %1, %0" : "+r"(res) : "r"(left));
    asm("setb %0" : "=m"(*carry));
    return res;
}

uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    uint8_t cf = 0;
    auto res = left - right;
    cf = res > left;
    if (cf) res = left;
    *carry = cf;
    return res;
}

SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        sub     eax, esi
        cmovb   eax, edi
        setb    byte ptr [rdx]
        ret
SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        xor     ecx, ecx
        cmp     edi, esi
        cmovae  ecx, esi
        setb    byte ptr [rdx]
        sub     eax, ecx
        ret

Configuration

SharpLab
Compiler Explorer

Regression?

No

Data

Analysis

category:cq
theme:basic-cq

The text was updated successfully, but these errors were encountered:

ghost · 2022-10-02T08:51:35Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

RyuJIT doesn't seem to be interested in flags set by instructions other than cmp and test.

`add` instruction and flags

For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands.
In this code,

using System;
public static class C
{
    public static uint AddCarry(uint left, uint right, out bool carry){
        var res = left + right;
        carry = res < left;
        return res;
    }
}

Results in

; Core CLR 6.0.822.36306 on amd64

C.AddCarry(UInt32, UInt32, Boolean ByRef)
    L0000: lea eax, [rcx+rdx]   # We can use `add ecx, edx` instead, for utilizing flags set by it.
    L0003: cmp eax, ecx         # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`.
    L0005: setb dl
    L0008: mov [r8], dl
    L000b: ret

In this case, LLVM utilizes this fact, for example:

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t Add(uint32_t left, uint32_t right, bool* carry) {
    auto res = left + right;
    *carry = res < left;
    return res;
}

Add(unsigned int, unsigned int, bool*):                             # @Add(unsigned int, unsigned int, bool*)
        mov     eax, edi
        add     eax, esi        # CF is set here if there is a carry.
        setb    byte ptr [rdx]  # `setb` extracts CF here.
        ret

`sub` instruction with redundant `cmp` instruction

It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value.

using System;
public static class C
{
    public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){
        bool cf = false;
        var res = left - right;    //CF is set if left < right
        cf = res > left;           //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers.
        if (cf) res = left;
        carry = cf;
        return res;
    }
}

; Core CLR 6.0.822.36306 on amd64

C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef)
    L0000: mov eax, ecx
    L0002: sub eax, edx
    L0004: cmp eax, ecx
    L0006: seta dl
    L0009: movzx edx, dl    # Is this `movzx edx, dl` really necessary?
    L000c: test edx, edx    # `test dl, dl` is also valid here.
    L000e: je short L0012
    L0010: mov eax, ecx
    L0012: mov [r8], dl
    L0015: ret

The cmp instruction does the same comparison as the sub instruction, so the sub and cmp instructions with the same register operands change flags in the same way.
In this specific case, LLVM mysteriously transforms this calculation into something like res = left - (left < right ? 0 : right).

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    auto res = left;
    asm("sub %1, %0" : "+r"(res) : "r"(right));
    asm("cmovb %1, %0" : "+r"(res) : "r"(left));
    asm("setb %0" : "=m"(*carry));
    return res;
}

uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    uint8_t cf = 0;
    auto res = left - right;
    cf = res > left;
    if (cf) res = left;
    *carry = cf;
    return res;
}

SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        sub     eax, esi
        cmovb   eax, edi
        setb    byte ptr [rdx]
        ret
SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        xor     ecx, ecx
        cmp     edi, esi
        cmovae  ecx, esi
        setb    byte ptr [rdx]
        sub     eax, ecx
        ret

Configuration

SharpLab
Compiler Explorer

Regression?

No

Data

Analysis

Author:	MineCake147E
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`
Milestone:	-

JulieLeeMSFT · 2022-10-04T17:49:12Z

cc @dotnet/jit-contrib.

TIHan · 2023-02-01T23:08:23Z

Related work: #81267

xtqqczze · 2023-03-18T20:21:21Z

Confirmed on latest nightly with Compiler Explorer:

// crossgen2 8.0.0-preview.3.23168.99+281c462b029c5056cdd446da86692e08a9377b01

C:AddCarry(uint,uint,byref):uint:
       lea      eax, [rcx+rdx]
       cmp      eax, ecx
       setb     dl
       mov      byte  ptr [r8], dl
       ret      

C:SubtractIfLargerOrEqual(uint,uint,byref):uint:
       mov      eax, ecx
       sub      eax, edx
       xor      edx, edx
       cmp      eax, ecx
       seta     dl
       test     edx, edx
       cmovne   eax, ecx
       mov      byte  ptr [r8], dl
       ret

TIHan · 2023-04-05T20:22:06Z

Also related: #82750

TIHan · 2023-04-05T22:04:56Z

For AddCarry, I was able to recognize the pattern, but it appears this isn't common and doesn't even appear in our SuperPMI diff tool.

tannergooding · 2023-04-06T00:02:02Z

AddCarry (and SubtractBorrow) is one that we’d like to explicitly take advantage of in BigInteger, Int128, and UInt128.

For the latter two it’s just a simple scenario like the original post gives. For BigInteger it needs to be many carry’s that are done in a loop and is quite a bit more complex

tannergooding · 2023-04-06T00:03:34Z

For these types we’re currently using an slightly different bit of code that accounts for the lack of support for AddCarry being recognized today

MineCake147E added the tenet-performance Performance related issue label Oct 2, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 2, 2022

ghost added the untriaged New issue has not been triaged by the area owner label Oct 2, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 4, 2022

JulieLeeMSFT added this to the Future milestone Oct 4, 2022

TIHan self-assigned this Feb 1, 2023

TIHan modified the milestones: Future, 8.0.0 Feb 1, 2023

TIHan modified the milestones: 8.0.0, Future Jun 9, 2023

Daniel-Svensson mentioned this issue Mar 3, 2024

Some System.Decimal performance improvements #99212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

MineCake147E commented Oct 2, 2022 •

edited by BruceForstall

Loading

ghost commented Oct 2, 2022

Description

`add` instruction and flags

`sub` instruction with redundant `cmp` instruction

Configuration

Regression?

Data

Analysis

JulieLeeMSFT commented Oct 4, 2022

TIHan commented Feb 1, 2023

xtqqczze commented Mar 18, 2023

TIHan commented Apr 5, 2023

TIHan commented Apr 5, 2023

tannergooding commented Apr 6, 2023

tannergooding commented Apr 6, 2023

Utilizing flags set by various instructions, not just cmp and test. #76502

Utilizing flags set by various instructions, not just cmp and test. #76502

Comments

MineCake147E commented Oct 2, 2022 • edited by BruceForstall Loading

Description

add instruction and flags

sub instruction with redundant cmp instruction

Configuration

Regression?

Data

Analysis

ghost commented Oct 2, 2022

Description

add instruction and flags

sub instruction with redundant cmp instruction

Configuration

Regression?

Data

Analysis

JulieLeeMSFT commented Oct 4, 2022

TIHan commented Feb 1, 2023

xtqqczze commented Mar 18, 2023

TIHan commented Apr 5, 2023

TIHan commented Apr 5, 2023

tannergooding commented Apr 6, 2023

tannergooding commented Apr 6, 2023

Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

MineCake147E commented Oct 2, 2022 •

edited by BruceForstall

Loading

`add` instruction and flags

`sub` instruction with redundant `cmp` instruction

`add` instruction and flags

`sub` instruction with redundant `cmp` instruction