-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with improving JIT for Range GetEnumerator #40770
Comments
It appears you are using x86 JIT. That said, x64 code with my recent-ish version of the runtime, obtained via Disasmo, is not substantially different: 01:
push rdi
push rsi
sub rsp, 40
02:
mov ecx, 1
mov eax, 10
cmp ecx, eax
jg SHORT 07
03:
mov esi, 1
mov edi, 10
dec esi
jmp SHORT 05
04:
mov ecx, esi
call System.Console:WriteLine(int)
05:
inc esi
cmp esi, edi
jl SHORT 04
06:
add rsp, 40
pop rsi
pop rdi
ret
07:
call RangeExtensions:<GetEnumerator>g__ThrowStartIsGreaterThanEnd|1_1()
int3 I was able to make the comparison go away with this hack, but I do not know if it is good enough in the general case: if (range.Start.GetValue() > range.End.GetValue())
ThrowStartIsGreaterThanEnd();
private static int GetValue(this Index index) => Unsafe.As<Index, int>(ref index); 01:
push rdi
push rsi
sub rsp, 40
02:
mov esi, 1
mov edi, 10
dec esi
jmp SHORT 04
03:
mov ecx, esi
call System.Console:WriteLine(int)
04:
inc esi
cmp esi, edi
jl SHORT 03
05:
add rsp, 40
pop rsi
pop rdi
ret |
Thanks for the report. #11848 called out a number of issues with the performance of Marking as future since we're closing down work for 5.0 cc @dotnet/jit-contrib |
That's great @AndyAyersMS When we remove all usage of range, and just pass in the start and end directly: using System;
using System.Runtime.CompilerServices;
namespace System.Runtime.CompilerServices {class IsExternalInit{}}
public static class Program {
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static void M1() {
var enumerator = RangeExtensions.GetEnumerator(1, 10);
while(enumerator.MoveNext())
Console.WriteLine(enumerator.Current);
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static void M2() {
for (int i = 1; i<10; i++)
Console.Write(i);
}
}
public static class RangeExtensions
{
public struct RangeEnumerator
{
public RangeEnumerator(int start, int end) => (Current, _end) = (start - 1, end);
public int Current { get; private set; }
private int _end;
public bool MoveNext() => ++Current < _end;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static RangeEnumerator GetEnumerator(int start, int end)
{
if (start > end)
ThrowStartIsGreaterThanEnd();
return new RangeEnumerator(start, end);
static void ThrowStartIsGreaterThanEnd() => throw new ArgumentException("start is greater than end");
}
} X86 does a pretty good job: Program.M1()
L0000: push ebp
L0001: mov ebp, esp
L0003: push esi
L0004: xor esi, esi
L0006: jmp short L000f
L0008: mov ecx, esi
L000a: call System.Console.WriteLine(Int32)
L000f: inc esi
L0010: cmp esi, 0xa
L0013: jl short L0008
L0015: pop esi
L0016: pop ebp
L0017: ret
Program.M2()
L0000: push ebp
L0001: mov ebp, esp
L0003: push esi
L0004: mov esi, 1
L0009: mov ecx, esi
L000b: call System.Console.Write(Int32)
L0010: inc esi
L0011: cmp esi, 0xa
L0014: jl short L0009
L0016: pop esi
L0017: pop ebp
L0018: ret x64 meanwhile is still not great: Program.M1()
L0000: sub rsp, 0x38
L0004: xor eax, eax
L0006: mov [rsp+0x30], rax
L000b: xor ecx, ecx
L000d: mov [rsp+0x28], ecx
L0011: mov [rsp+0x2c], ecx
L0015: mov [rsp+0x28], ecx
L0019: mov dword ptr [rsp+0x2c], 0xa
L0021: mov rcx, [rsp+0x28]
L0026: mov [rsp+0x30], rcx
L002b: jmp short L0036
L002d: mov ecx, [rsp+0x30]
L0031: call System.Console.WriteLine(Int32)
L0036: mov ecx, [rsp+0x30]
L003a: inc ecx
L003c: mov [rsp+0x30], ecx
L0040: cmp ecx, [rsp+0x34]
L0044: jl short L002d
L0046: add rsp, 0x38
L004a: ret
Program.M2()
L0000: push rsi
L0001: sub rsp, 0x20
L0005: mov esi, 1
L000a: mov ecx, esi
L000c: call System.Console.Write(Int32)
L0011: inc esi
L0013: cmp esi, 0xa
L0016: jl short L000a
L0018: add rsp, 0x20
L001c: pop rsi
L001d: ret |
This is not the codegen I am getting. I too saw it on sharplab and was a bit puzzled. But it does indeed seem like .NET 5 is that much better than .NET Core 3.1 in this case. |
@YairHalberstadt have you had a chance to look at 5.0 (or even 6.0) codegen here? For the second example I am seeing: ; Core CLR v5.0.421.11614 on amd64
Program.M1()
L0000: push rsi
L0001: sub rsp, 0x20
L0005: xor esi, esi
L0007: jmp short L0010
L0009: mov ecx, esi
L000b: call System.Console.WriteLine(Int32)
L0010: inc esi
L0012: cmp esi, 0xa
L0015: jl short L0009
L0017: add rsp, 0x20
L001b: pop rsi
L001c: ret
Program.M2()
L0000: push rsi
L0001: sub rsp, 0x20
L0005: mov esi, 1
L000a: mov ecx, esi
L000c: call System.Console.Write(Int32)
L0011: inc esi
L0013: cmp esi, 0xa
L0016: jl short L000a
L0018: add rsp, 0x20
L001c: pop rsi
L001d: ret Actually using range here still has challenges -- when we eliminate upstream control flow in assertion prop we don't clean up downstream dead phis, and so fail to propagate constants. LSRA then manages to preference and coalesce a bunch of temp lifetimes and the final codegen has obvious known-outcome compares that don't get optimized, as in the example shown above by @SingleAccretion |
Ok, I am going to mark this as future. There is still the interesting case where we clean up control flow during optimization and then fail to benefit from that like we should. We see this a lot with index and range. |
I'm using extension GetEnumerator (coming in C# 9) to implement foreach on Range. See https://github.com/YairHalberstadt/RangeForeach.
I'm using sharplab to check the jit output.
I want everybody to be able to rewrite their for loops to foreach loops without having to think about any potential performance impact. Therefore ideally we'd want
M1
andM2
in:To have identical asm.
My current implementation is:
Jit is:
To be honest I'm pretty impressed by how much the JIT is managing to do here, but it's still approximately twice as long.
The main thing that isn't getting optimized away is the
if (range.Start.Value > range.End.Value)
check.Any ideas if I can optimize this further, or if the JIT might be able to do better here?
category:cq
theme:optimization
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: