Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with improving JIT for Range GetEnumerator #40770

Open
YairHalberstadt opened this issue Aug 13, 2020 · 6 comments
Open

Help with improving JIT for Range GetEnumerator #40770

YairHalberstadt opened this issue Aug 13, 2020 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@YairHalberstadt
Copy link
Contributor

YairHalberstadt commented Aug 13, 2020

I'm using extension GetEnumerator (coming in C# 9) to implement foreach on Range. See https://github.com/YairHalberstadt/RangeForeach.

I'm using sharplab to check the jit output.

I want everybody to be able to rewrite their for loops to foreach loops without having to think about any potential performance impact. Therefore ideally we'd want M1 and M2 in:

public static class Program {
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M1() {
        foreach (var i in 1..10)
            Console.WriteLine(i);
    }
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M2() {
        for (int i = 1; i<10; i++)
            Console.WriteLine(i);
    }
}

To have identical asm.

My current implementation is:

using System;
using System.Runtime.CompilerServices;


public static class Program {
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M1() {
        foreach (var i in 1..10)
            Console.WriteLine(i);
    }
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M2() {
        for (int i = 1; i<10; i++)
            Console.WriteLine(i);
    }
}

public static class RangeExtensions
{
    public struct RangeEnumerator
    {
        public RangeEnumerator(int start, int end) => (Current, _end) = (start - 1, end);
        public int Current { get; private set; }
        private int _end;
        public bool MoveNext() => ++Current < _end;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static RangeEnumerator GetEnumerator(this Range range)
    {
        if (range.Start.IsFromEnd || range.End.IsFromEnd)
            ThrowIsFromEnd();

        if (range.Start.Value > range.End.Value)
            ThrowStartIsGreaterThanEnd();

        return new RangeEnumerator(range.Start.Value, range.End.Value);

        static void ThrowIsFromEnd() => throw new ArgumentException("range start and end must not be from end");
        static void ThrowStartIsGreaterThanEnd() => throw new ArgumentException("start is greater than end");
    }
}

Jit is:

Program.M1()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: mov ecx, 1
    L000a: mov eax, 0xa
    L000f: cmp ecx, eax
    L0011: jg short L0030
    L0013: mov esi, 1
    L0018: mov edi, 0xa
    L001d: dec esi
    L001e: jmp short L0027
    L0020: mov ecx, esi
    L0022: call System.Console.WriteLine(Int32)
    L0027: inc esi
    L0028: cmp esi, edi
    L002a: jl short L0020
    L002c: pop esi
    L002d: pop edi
    L002e: pop ebp
    L002f: ret
    L0030: call dword ptr [0x2622c9fc]
    L0036: int3

Program.M2()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: mov esi, 1
    L0009: mov ecx, esi
    L000b: call System.Console.WriteLine(Int32)
    L0010: inc esi
    L0011: cmp esi, 0xa
    L0014: jl short L0009
    L0016: pop esi
    L0017: pop ebp
    L0018: ret

To be honest I'm pretty impressed by how much the JIT is managing to do here, but it's still approximately twice as long.

The main thing that isn't getting optimized away is the if (range.Start.Value > range.End.Value) check.
Any ideas if I can optimize this further, or if the JIT might be able to do better here?

category:cq
theme:optimization
skill-level:expert
cost:large

@YairHalberstadt YairHalberstadt added the tenet-performance Performance related issue label Aug 13, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Aug 13, 2020
@YairHalberstadt YairHalberstadt changed the title Help with improving JIT for method Help with improving JIT for Range GetEnumerator Aug 13, 2020
@SingleAccretion
Copy link
Contributor

It appears you are using x86 JIT. That said, x64 code with my recent-ish version of the runtime, obtained via Disasmo, is not substantially different:

01:
       push     rdi
       push     rsi
       sub      rsp, 40
02:
       mov      ecx, 1
       mov      eax, 10
       cmp      ecx, eax
       jg       SHORT 07
03:
       mov      esi, 1
       mov      edi, 10
       dec      esi
       jmp      SHORT 05
04:
       mov      ecx, esi
       call     System.Console:WriteLine(int)
05:
       inc      esi
       cmp      esi, edi
       jl       SHORT 04
06:
       add      rsp, 40
       pop      rsi
       pop      rdi
       ret      
07:
       call     RangeExtensions:<GetEnumerator>g__ThrowStartIsGreaterThanEnd|1_1()
       int3 

I was able to make the comparison go away with this hack, but I do not know if it is good enough in the general case:

if (range.Start.GetValue() > range.End.GetValue())
            ThrowStartIsGreaterThanEnd();

private static int GetValue(this Index index) => Unsafe.As<Index, int>(ref index);
01:
       push     rdi
       push     rsi
       sub      rsp, 40
02:
       mov      esi, 1
       mov      edi, 10
       dec      esi
       jmp      SHORT 04
03:
       mov      ecx, esi
       call     System.Console:WriteLine(int)
04:
       inc      esi
       cmp      esi, edi
       jl       SHORT 03
05:
       add      rsp, 40
       pop      rsi
       pop      rdi
       ret

@AndyAyersMS
Copy link
Member

Thanks for the report. #11848 called out a number of issues with the performance of Range. Good news is that most of these have been addressed over time, but it looks like there's more to do.

Marking as future since we're closing down work for 5.0

cc @dotnet/jit-contrib

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Aug 13, 2020
@AndyAyersMS AndyAyersMS added this to the Future milestone Aug 13, 2020
@YairHalberstadt
Copy link
Contributor Author

That's great @AndyAyersMS

When we remove all usage of range, and just pass in the start and end directly:

using System;
using System.Runtime.CompilerServices;

namespace System.Runtime.CompilerServices {class IsExternalInit{}}
public static class Program {
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M1() {
        var enumerator = RangeExtensions.GetEnumerator(1, 10);
        while(enumerator.MoveNext())
            Console.WriteLine(enumerator.Current);
    }
    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    public static void M2() {
        for (int i = 1; i<10; i++)
            Console.Write(i);
    }
}

public static class RangeExtensions
{
    public struct RangeEnumerator
    {
        public RangeEnumerator(int start, int end) => (Current, _end) = (start - 1, end);
        public int Current { get; private set; }
        private int _end;
        public bool MoveNext() => ++Current < _end;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static RangeEnumerator GetEnumerator(int start, int end)
    {
        if (start > end)
            ThrowStartIsGreaterThanEnd();
            
        return new RangeEnumerator(start, end);

        static void ThrowStartIsGreaterThanEnd() => throw new ArgumentException("start is greater than end");
    }
}

X86 does a pretty good job:

Program.M1()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: xor esi, esi
    L0006: jmp short L000f
    L0008: mov ecx, esi
    L000a: call System.Console.WriteLine(Int32)
    L000f: inc esi
    L0010: cmp esi, 0xa
    L0013: jl short L0008
    L0015: pop esi
    L0016: pop ebp
    L0017: ret

Program.M2()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push esi
    L0004: mov esi, 1
    L0009: mov ecx, esi
    L000b: call System.Console.Write(Int32)
    L0010: inc esi
    L0011: cmp esi, 0xa
    L0014: jl short L0009
    L0016: pop esi
    L0017: pop ebp
    L0018: ret

x64 meanwhile is still not great:

Program.M1()
    L0000: sub rsp, 0x38
    L0004: xor eax, eax
    L0006: mov [rsp+0x30], rax
    L000b: xor ecx, ecx
    L000d: mov [rsp+0x28], ecx
    L0011: mov [rsp+0x2c], ecx
    L0015: mov [rsp+0x28], ecx
    L0019: mov dword ptr [rsp+0x2c], 0xa
    L0021: mov rcx, [rsp+0x28]
    L0026: mov [rsp+0x30], rcx
    L002b: jmp short L0036
    L002d: mov ecx, [rsp+0x30]
    L0031: call System.Console.WriteLine(Int32)
    L0036: mov ecx, [rsp+0x30]
    L003a: inc ecx
    L003c: mov [rsp+0x30], ecx
    L0040: cmp ecx, [rsp+0x34]
    L0044: jl short L002d
    L0046: add rsp, 0x38
    L004a: ret

Program.M2()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov esi, 1
    L000a: mov ecx, esi
    L000c: call System.Console.Write(Int32)
    L0011: inc esi
    L0013: cmp esi, 0xa
    L0016: jl short L000a
    L0018: add rsp, 0x20
    L001c: pop rsi
    L001d: ret

@SingleAccretion
Copy link
Contributor

x64 meanwhile is still not great

This is not the codegen I am getting. I too saw it on sharplab and was a bit puzzled. But it does indeed seem like .NET 5 is that much better than .NET Core 3.1 in this case.

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall modified the milestones: Future, 6.0.0 Nov 11, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 11, 2020
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@AndyAyersMS
Copy link
Member

@YairHalberstadt have you had a chance to look at 5.0 (or even 6.0) codegen here? For the second example I am seeing:

; Core CLR v5.0.421.11614 on amd64

Program.M1()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: xor esi, esi
    L0007: jmp short L0010
    L0009: mov ecx, esi
    L000b: call System.Console.WriteLine(Int32)
    L0010: inc esi
    L0012: cmp esi, 0xa
    L0015: jl short L0009
    L0017: add rsp, 0x20
    L001b: pop rsi
    L001c: ret

Program.M2()
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov esi, 1
    L000a: mov ecx, esi
    L000c: call System.Console.Write(Int32)
    L0011: inc esi
    L0013: cmp esi, 0xa
    L0016: jl short L000a
    L0018: add rsp, 0x20
    L001c: pop rsi
    L001d: ret

Actually using range here still has challenges -- when we eliminate upstream control flow in assertion prop we don't clean up downstream dead phis, and so fail to propagate constants. LSRA then manages to preference and coalesce a bunch of temp lifetimes and the final codegen has obvious known-outcome compares that don't get optimized, as in the example shown above by @SingleAccretion

@AndyAyersMS
Copy link
Member

Ok, I am going to mark this as future.

There is still the interesting case where we clean up control flow during optimization and then fail to benefit from that like we should. We see this a lot with index and range.

@AndyAyersMS AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Apr 9, 2021
@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, Future Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants