Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support recognizing Blend as commutative #82365

Open
tannergooding opened this issue Feb 19, 2023 · 5 comments
Open

Support recognizing Blend as commutative #82365

tannergooding opened this issue Feb 19, 2023 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Feb 19, 2023

We don't currently track Blend as commutative and so we miss an opportunity to ensure that "memory operands" always appear on the RHS: SixLabors/ImageSharp#2359 (comment)

Support should be added to treat this as commutative which really just entails inverting the constant (e.g. Blend(x, y, 0b0111_0111) is equivalent to Blend(y, x, 0b1000_1000)).

There are actually a few operations this applies to that involve two inputs + control mask. Not all of them are as trivially as inverting the constant, however.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 19, 2023
@tannergooding tannergooding added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 19, 2023
@ghost
Copy link

ghost commented Feb 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

We don't currently track Blend as commutative and so we miss an opportunity to ensure that "memory operands" always appear on the RHS: SixLabors/ImageSharp#2359 (comment)

Support should be added to treat this as commutative which really just entails inverting the constant (e.g. Blend(x, y, 0b0111_0111) is equivalent to Blend(y, x, 0b1000_1000)).

Author: tannergooding
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Mar 1, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Mar 1, 2023
@IDisposable
Copy link
Contributor

I'm trying to grok this request (as I know nothing about Avx)

In the process, I noticed this code in the /src/tests/JIT/HardwareIntrinsics/X86/Avx1/Blend.cs test code at line 29-36, that looks wrong to me:

// SDDD SDDD
var vf3 = Avx.Blend(vf1, vf2, 1);
Unsafe.Write(floatTable.outArrayPtr, vf3);

if (!floatTable.CheckResult((x, y, z) => (z[0] == y[0]) && (z[1] == x[1]) &&
                                         (z[2] == x[2]) && (z[3] == x[3]) &&
                                         (z[4] == x[4]) && (z[5] == x[5]) &&
                                         (z[6] == x[6]) && (z[7] == x[7])))

I think the pattern in the comment is supposed to mean that we're expecting the Source or Destination values to appear based on the third argument (opmask) to the Avx.Blend() call which would make sense in float/_mm512_mask_blend_ps/VBLENDMPS case with a instruction mask of 0b0001

By that reading (which I may be wrong about), the test on line 35 against z[4] should be comparing to y[4] not x[4]. I am not sure if the comment is wrong (e.g. should be // SDDD DDDD) or if the operator value of 1 is wrong.

This is not area of any expertise at all given this is my very first look at AVX at any level, but the code at line 83-90 looks to match my expectation in that S in the comment means compare to y and D means compare to x.

// SDSD SDSD
vf3 = Avx.Blend(vf1, vf2, 85);
Unsafe.Write(floatTable.outArrayPtr, vf3);

if (!floatTable.CheckResult((x, y, z) => (z[0] == y[0]) && (z[1] == x[1]) &&
                                         (z[2] == y[2]) && (z[3] == x[3]) &&
                                         (z[4] == y[4]) && (z[5] == x[5]) &&
                                         (z[6] == y[6]) && (z[7] == x[7])))

If I'm absolutely nuts, please let me know, I don't want to start down the road with the wrong understanding.

@JulieLeeMSFT
Copy link
Member

@tannergooding, could you guide @IDisposable for this?

@tannergooding
Copy link
Member Author

We won't be able to land this in .NET 8; but it can be done at any point in the future.

I am not sure if the comment is wrong

The comment is wrong, the behavior is:

IF (IMM8[0] = 0) THEN DEST[31:0] :=SRC1[31:0]
    ELSE DEST [31:0] := SRC2[31:0] FI
IF (IMM8[1] = 0) THEN DEST[63:32] := SRC1[63:32]
    ELSE DEST [63:32] := SRC2[63:32] FI
IF (IMM8[2] = 0) THEN DEST[95:64] := SRC1[95:64]
    ELSE DEST [95:64] := SRC2[95:64] FI
IF (IMM8[3] = 0) THEN DEST[127:96] := SRC1[127:96]
    ELSE DEST [127:96] := SRC2[127:96] FI
IF (IMM8[4] = 0) THEN DEST[159:128] := SRC1[159:128]
    ELSE DEST [159:128] := SRC2[159:128] FI
IF (IMM8[5] = 0) THEN DEST[191:160] := SRC1[191:160]
    ELSE DEST [191:160] := SRC2[191:160] FI
IF (IMM8[6] = 0) THEN DEST[223:192] := SRC1[223:192]
    ELSE DEST [223:192] := SRC2[223:192] FI
IF (IMM8[7] = 0) THEN DEST[255:224] := SRC1[255:224]
    ELSE DEST [255:224] := SRC2[255:224] FI.

So it is not done as 1 operation repeated across 2x128-bit lanes, but rather as 1x256-bit lane instead.

@IDisposable
Copy link
Contributor

Excellent, I'll tackle this for the next release. Thanks for the confirmation @tannergooding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

3 participants