Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing flags set by various instructions, not just cmp and test. #76502

Open
MineCake147E opened this issue Oct 2, 2022 · 8 comments
Open
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@MineCake147E
Copy link
Contributor

MineCake147E commented Oct 2, 2022

Description

RyuJIT doesn't seem to be interested in flags set by instructions other than cmp and test.

add instruction and flags

For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands.
In this code,

using System;
public static class C
{
    public static uint AddCarry(uint left, uint right, out bool carry){
        var res = left + right;
        carry = res < left;
        return res;
    }
}

Results in

; Core CLR 6.0.822.36306 on amd64

C.AddCarry(UInt32, UInt32, Boolean ByRef)
    L0000: lea eax, [rcx+rdx]   # We can use `add ecx, edx` instead, for utilizing flags set by it.
    L0003: cmp eax, ecx         # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`.
    L0005: setb dl
    L0008: mov [r8], dl
    L000b: ret

In this case, LLVM utilizes this fact, for example:

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t Add(uint32_t left, uint32_t right, bool* carry) {
    auto res = left + right;
    *carry = res < left;
    return res;
}
Add(unsigned int, unsigned int, bool*):                             # @Add(unsigned int, unsigned int, bool*)
        mov     eax, edi
        add     eax, esi        # CF is set here if there is a carry.
        setb    byte ptr [rdx]  # `setb` extracts CF here.
        ret

sub instruction with redundant cmp instruction

It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value.

using System;
public static class C
{
    public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){
        bool cf = false;
        var res = left - right;    //CF is set if left < right
        cf = res > left;           //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers.
        if (cf) res = left;
        carry = cf;
        return res;
    }
}
; Core CLR 6.0.822.36306 on amd64

C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef)
    L0000: mov eax, ecx
    L0002: sub eax, edx
    L0004: cmp eax, ecx
    L0006: seta dl
    L0009: movzx edx, dl    # Is this `movzx edx, dl` really necessary?
    L000c: test edx, edx    # `test dl, dl` is also valid here.
    L000e: je short L0012
    L0010: mov eax, ecx
    L0012: mov [r8], dl
    L0015: ret

The cmp instruction does the same comparison as the sub instruction, so the sub and cmp instructions with the same register operands change flags in the same way.
In this specific case, LLVM mysteriously transforms this calculation into something like res = left - (left < right ? 0 : right).

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    auto res = left;
    asm("sub %1, %0" : "+r"(res) : "r"(right));
    asm("cmovb %1, %0" : "+r"(res) : "r"(left));
    asm("setb %0" : "=m"(*carry));
    return res;
}

uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    uint8_t cf = 0;
    auto res = left - right;
    cf = res > left;
    if (cf) res = left;
    *carry = cf;
    return res;
}
SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        sub     eax, esi
        cmovb   eax, edi
        setb    byte ptr [rdx]
        ret
SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        xor     ecx, ecx
        cmp     edi, esi
        cmovae  ecx, esi
        setb    byte ptr [rdx]
        sub     eax, ecx
        ret

Configuration

SharpLab
Compiler Explorer

Regression?

No

Data

Analysis

category:cq
theme:basic-cq

@MineCake147E MineCake147E added the tenet-performance Performance related issue label Oct 2, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 2, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 2, 2022
@ghost
Copy link

ghost commented Oct 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

RyuJIT doesn't seem to be interested in flags set by instructions other than cmp and test.

add instruction and flags

For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands.
In this code,

using System;
public static class C
{
    public static uint AddCarry(uint left, uint right, out bool carry){
        var res = left + right;
        carry = res < left;
        return res;
    }
}

Results in

; Core CLR 6.0.822.36306 on amd64

C.AddCarry(UInt32, UInt32, Boolean ByRef)
    L0000: lea eax, [rcx+rdx]   # We can use `add ecx, edx` instead, for utilizing flags set by it.
    L0003: cmp eax, ecx         # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`.
    L0005: setb dl
    L0008: mov [r8], dl
    L000b: ret

In this case, LLVM utilizes this fact, for example:

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t Add(uint32_t left, uint32_t right, bool* carry) {
    auto res = left + right;
    *carry = res < left;
    return res;
}
Add(unsigned int, unsigned int, bool*):                             # @Add(unsigned int, unsigned int, bool*)
        mov     eax, edi
        add     eax, esi        # CF is set here if there is a carry.
        setb    byte ptr [rdx]  # `setb` extracts CF here.
        ret

sub instruction with redundant cmp instruction

It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value.

using System;
public static class C
{
    public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){
        bool cf = false;
        var res = left - right;    //CF is set if left < right
        cf = res > left;           //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers.
        if (cf) res = left;
        carry = cf;
        return res;
    }
}
; Core CLR 6.0.822.36306 on amd64

C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef)
    L0000: mov eax, ecx
    L0002: sub eax, edx
    L0004: cmp eax, ecx
    L0006: seta dl
    L0009: movzx edx, dl    # Is this `movzx edx, dl` really necessary?
    L000c: test edx, edx    # `test dl, dl` is also valid here.
    L000e: je short L0012
    L0010: mov eax, ecx
    L0012: mov [r8], dl
    L0015: ret

The cmp instruction does the same comparison as the sub instruction, so the sub and cmp instructions with the same register operands change flags in the same way.
In this specific case, LLVM mysteriously transforms this calculation into something like res = left - (left < right ? 0 : right).

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    auto res = left;
    asm("sub %1, %0" : "+r"(res) : "r"(right));
    asm("cmovb %1, %0" : "+r"(res) : "r"(left));
    asm("setb %0" : "=m"(*carry));
    return res;
}

uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    uint8_t cf = 0;
    auto res = left - right;
    cf = res > left;
    if (cf) res = left;
    *carry = cf;
    return res;
}
SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        sub     eax, esi
        cmovb   eax, edi
        setb    byte ptr [rdx]
        ret
SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        xor     ecx, ecx
        cmp     edi, esi
        cmovae  ecx, esi
        setb    byte ptr [rdx]
        sub     eax, ecx
        ret

Configuration

SharpLab
Compiler Explorer

Regression?

No

Data

Analysis

Author: MineCake147E
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT
Copy link
Member

cc @dotnet/jit-contrib.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 4, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Oct 4, 2022
@TIHan TIHan self-assigned this Feb 1, 2023
@TIHan TIHan modified the milestones: Future, 8.0.0 Feb 1, 2023
@TIHan
Copy link
Contributor

TIHan commented Feb 1, 2023

Related work: #81267

@xtqqczze
Copy link
Contributor

Confirmed on latest nightly with Compiler Explorer:

// crossgen2 8.0.0-preview.3.23168.99+281c462b029c5056cdd446da86692e08a9377b01

C:AddCarry(uint,uint,byref):uint:
       lea      eax, [rcx+rdx]
       cmp      eax, ecx
       setb     dl
       mov      byte  ptr [r8], dl
       ret      

C:SubtractIfLargerOrEqual(uint,uint,byref):uint:
       mov      eax, ecx
       sub      eax, edx
       xor      edx, edx
       cmp      eax, ecx
       seta     dl
       test     edx, edx
       cmovne   eax, ecx
       mov      byte  ptr [r8], dl
       ret      

@TIHan
Copy link
Contributor

TIHan commented Apr 5, 2023

Also related: #82750

@TIHan
Copy link
Contributor

TIHan commented Apr 5, 2023

For AddCarry, I was able to recognize the pattern, but it appears this isn't common and doesn't even appear in our SuperPMI diff tool.

@tannergooding
Copy link
Member

AddCarry (and SubtractBorrow) is one that we’d like to explicitly take advantage of in BigInteger, Int128, and UInt128.

For the latter two it’s just a simple scenario like the original post gives. For BigInteger it needs to be many carry’s that are done in a loop and is quite a bit more complex

@tannergooding
Copy link
Member

For these types we’re currently using an slightly different bit of code that accounts for the lack of support for AddCarry being recognized today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants