-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector{128,256}<T>.ToScalar suboptimal codegen \ { double } #12733
Comments
/cc: @tannergooding please have a look here |
I remember that |
Thanks.
Is better, but still not ideal: vmovupd xmm0, xmmword ptr [rsp+08H]
vmovd rax, xmm0 It is documented as |
Is that output from the JIT's own disassembler? It's probably |
Looked at the JIT-dump and in VS-dissambly view (both in release with optimization on, tiering disabled). SharpLab with CoreCLR shows the same. |
Right, |
BTW, in this link, |
👍 So codegen could be vmovd rax, xmmword ptr [rsp+08H] or vmovq rax, xmmword ptr [rsp+08H] There is the extra
Thanks. My code to show thisusing System;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
Vector128<sbyte> vec = Vector128.Create((byte)0x42).AsSByte();
long l = ToLong(vec);
double d = ToDouble(vec);
if (double.IsNaN(d) || l == long.MaxValue)
Environment.Exit(1);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static long ToLong(Vector128<sbyte> vec)
{
//return vec.AsInt64().ToScalar();
return Sse2.X64.ConvertToInt64(vec.AsInt64());
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static double ToDouble(Vector128<sbyte> vec)
{
return vec.AsDouble().ToScalar();
}
}
} dasm for that code; Assembly listing for method Program:ToLong(struct):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 1, 1 ) simd16 -> [rsp+0x08]
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M6413_IG01:
C5F877 vzeroupper
6690 nop
G_M6413_IG02:
C5F910442408 vmovupd xmm0, xmmword ptr [rsp+08H]
C4E1F97EC0 vmovd rax, xmm0
G_M6413_IG03:
C3 ret
; Total bytes of code 17, prolog size 5 for method Program:ToLong(struct):long
; ============================================================
; Assembly listing for method Program:ToDouble(struct):double
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 1, 1 ) simd16 -> [rsp+0x08]
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M24050_IG01:
C5F877 vzeroupper
6690 nop
G_M24050_IG02:
C5FB10442408 vmovsd xmm0, xmmword ptr [rsp+08H]
G_M24050_IG03:
C3 ret
; Total bytes of code 12, prolog size 5 for method Program:ToDouble(struct):double
; ============================================================ |
Isn't this needed for VEX? No matter if Vector128 or Vector256. |
It is a bit complex, please see https://github.com/dotnet/coreclr/issues/21062. |
Marking as future; if there's something surgical we can fix, or there's a bug, we can move to 3.0. |
It might be related. private static long AsLong(double dbl)
{
return *(long*)&dbl;
} |
@omariom What about this? unsafe class C
{
private static long AsLong(in double dbl)
{
return *(long*)Unsafe.AsPointer(ref Unsafe.AsRef(dbl));
}
} Asm output: C.AsLong(Double ByRef)
L0000: mov rax, [rcx]
L0003: ret |
@hypeartist this uses also the stack |
Vector128<long>.ToScalar()
stores thexmm
to the stack, then readsr64
from there via amov
.Ideally this would use
movq
(c++ intrinsic:_mm_cvtsi128_si64
), so asm becomes:Vector128<double>.ToScalar()
produces expected code (vmovsd
) -- no issue there.Same CQ issue for
int
, and forVector256<T>
.Didn't check other types, than noted here.
category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: