Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient codegen for casts between same size types. #11413

Closed
tannergooding opened this issue Nov 5, 2018 · 13 comments
Closed

Inefficient codegen for casts between same size types. #11413

tannergooding opened this issue Nov 5, 2018 · 13 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Nov 5, 2018

As per dotnet/coreclr#20788 (comment), using BitConverter.SingleToInt32Bits, BitConverter.Int32BitsToSingle, BitConverter.DoubleToInt64Bits, and BitConverter.Int64BitsToDouble generates some "inefficient" code.

Currently BitConverter.SingleToInt32Bits is generating:

vmovss   dword ptr [rsp+14H], xmm0
mov      eax, dword ptr [rsp+14H]

When it could generate:

vmovd eax, xmm0

Currently BitConverter.Int32BitsToSingle is generating:

mov      dword ptr [rsp+0CH], eax
vmovss   xmm0, dword ptr [rsp+0CH]

When it could generate:

vmovd xmm0, eax

The same logic applies to double <-> long, but using the rax register and vmovq.

  • For x86, it can use the movq xmm, [m64] or movq [m64], xmm encoding

category:cq
theme:casts
skill-level:intermediate
cost:medium

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@GrabYourPitchforks
Copy link
Member

Related PR: #6864, which makes use of this API

@tannergooding
Copy link
Member Author

#33476 introduced a fix which works around the issue, but we are leaving this issue open to track the JIT getting a more general codegen improvement that isn't specific to BitConverter.

@Kein
Copy link

Kein commented Sep 24, 2020

I guess this is partially related to aforementioned issue - is there any reason float.isFinite() is safe while the rest of similar extensions using similar tools are unsafe (like IsInfinity)?

@GrabYourPitchforks
Copy link
Member

@Kein Looks like we just forgot to remove the unsafe modifier from the Single.IsInfinity method. It doesn't affect callers; they're not required to be unsafe. But it does provide some further evidence that we should consider stripping the unsafe modifier from types and functions that no longer need it, just as an overall code hygiene cleanup.

@tannergooding
Copy link
Member Author

But it does provide some further evidence that we should consider stripping the unsafe modifier from types and functions that no longer need it, just as an overall code hygiene cleanup.

This seems like one of the simpler analyzers we could write, is there an issue suggesting it yet?

@GrabYourPitchforks
Copy link
Member

There's some limited discussion around it as part of #31354. Basically: if we have an analyzer that says "you're not using pointers, remove unsafe" alongside a language feature that enforces "this API doesn't use pointers but you need to wrap call sites within an unsafe block", we need to figure out how these two things interact together.

@deeprobin
Copy link
Contributor

@JulieLeeMSFT
Copy link
Member

With .NET 9 Preview 7,
BitConverter.SingleToInt32Bits(float) generates

vmovd    eax, xmm0

BitConverter.Int32BitsToSingle(int) generates

vmovd    xmm0, ecx

BitConverter.DoubleToInt64Bits(double) generates

 vmovd    rax, xmm0

BitConverter.Int64BitsToDouble(long) generates

vmovd    xmm0, rcx

@JulieLeeMSFT
Copy link
Member

@tannergooding, please check the generated encoding with .NET 9 above (#11413 (comment)).

  • float <-> int is generating the expected code.
  • double <-> long is generating vmovd, not vmovq. Do we need further optimization?

    The same logic applies to double <-> long, but using the rax register and vmovq.

cc @dotnet/jit-contrib

@tannergooding
Copy link
Member Author

This got resolved some time back (#71567, as well as other commits by other JIT team members)

The movd for double <-> long is actually just a disassembly bug/quirk. movd and movq are the same instruction, it's just that movq is the preferred disassembly when operating on 64-bit registers. Looks like we're not accounting for that when we do the disasm dumping

@JulieLeeMSFT
Copy link
Member

Very good.

This got resolved some time back (#71567, as well as other commits by other JIT team members)

Also checked SingleToInt32Bits(float) on x86

mov      eax, dword ptr [ebp+0x08]

@JulieLeeMSFT
Copy link
Member

on x86,
Int32BitsToSingle(int):float

       vmovd    xmm0, ecx
       vmovss   dword ptr [ebp-0x04], xmm0
       fld      dword ptr [ebp-0x04]

DoubleToInt64Bits(double):long

       vmovsd   xmm0, qword ptr [ebp+0x08]
       sub      esp, 8
       vmovsd   qword ptr [esp], xmm0
       call     [System.BitConverter:DoubleToInt64Bits(double):long]

Int64BitsToDouble(long):double

       push     dword ptr [ebp+0x0C]
       push     dword ptr [ebp+0x08]
       call     [System.BitConverter:Int64BitsToDouble(long):double]
       fstp     qword ptr [ebp-0x08]
       vmovsd   xmm0, qword ptr [ebp-0x08]
       vmovsd   qword ptr [ebp-0x08], xmm0
       fld      qword ptr [ebp-0x08]

@tannergooding
Copy link
Member Author

👍

x86 is expected to be less efficient for DoubleToInt64Bits and the inverse here. There's no singular instruction available since the register size is 32-bits and so we have to do handle the lower and upper halves through memory.

There is notably a way to load a double directly to/from memory using a specialized memory only version of movq on 32-bit, but that's a non-trivial amount of work and it hasn't bubbled up in priority yet. That work is tracked instead by #11626

@github-actions github-actions bot locked and limited conversation to collaborators Sep 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

8 participants