Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lane offsets for AVX2 pack instructions #1442

Closed
wants to merge 1 commit into from

Conversation

Noratrieb
Copy link
Member

fast_image_resize yielded broken images, a little bit of println bisecting revealed the SIMD instruction that was at fault. A bit of staring at the cg_clif impl and the Intel manual then revealed the place of the bug. There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.

`fast_image_resize` yielded broken images, a little bit of println
bisecting revealed the SIMD instruction that was at fault. A bit of
staring at the cg_clif impl and the Intel manual then revealed the place
of the bug. There is a lot of copy pasting here, so I'm not surprised
it's buggy ^^'.
@Noratrieb
Copy link
Member Author

I'm not sure where tests for this are supposed to go. stdarch tests?

@Noratrieb
Copy link
Member Author

The duplicated code for these packs does make me worry a bit. After going through the intrinsics guide, I also found some packs that weren't implemented yet. I think I'm going to restructure the code here so that the packs are neatly packed together, with all of _mm{,256}_pack{u,us}_epi{16,32} implemented.

@bjorn3
Copy link
Member

bjorn3 commented Jan 2, 2024

I'm not sure where tests for this are supposed to go.

I've been copying stdarch tests into example/std_example.rs several times.

There is a lot of copy pasting here, so I'm not surprised it's buggy ^^'.

Yeah, this code is horrible. I hope to some day generate it directly from the instruction manual or something like that. Or create a DSL that allows writing this kind of stuff with less code duplication (and maybe also allows it to be reused by miri and other tools).

@bjorn3
Copy link
Member

bjorn3 commented Jan 2, 2024

Thanks for the fix! Please ignore the test failure. That is rust-random/rand#1355.

@Noratrieb
Copy link
Member Author

What's currently implemented vs what exists:

sse 16 avx 16 sse 32 avx 32
unsigned _mm_packus_epi16|llvm.x86.sse2.packuswb.128 ✅ _mm256_packus_epi16|llvm.x86.avx2.packuswb ✅ _mm_packus_epi32|llvm.x86.sse41.packusdw ✅ _mm256_packus_epi32|llvm.x86.avx2.packusdw
signed _mm_packs_epi16|llvm.x86.sse2.packsswb.128 _mm256_packs_epi16|llvm.x86.avx2.packsswb _mm_packs_epi32|llvm.x86.sse2.packssdw.128 ✅ _mm256_packs_epi32|llvm.x86.avx2.packssdw ✅

I'll clean it up a bit and implement all of those based on that, should be fairly little code. llvm.x86.sse41.packusdw is also pretty suspicious as it currently uses smin, while the other unsigned ones use umin.

@bjorn3
Copy link
Member

bjorn3 commented Jan 2, 2024

llvm.x86.sse41.packusdw is also pretty suspicious as it currently uses smin, while the other unsigned ones use umin.

Smin is correct here afaict. The input is a signed 32bit integer and we need to check that it fits in an unsigned 16bit integer. Using umin would cause the input to be interpreted as unsigned 32bit integer. Although because of the smax before it, I think it does actually not matter at all if umin or smin is used.

@bjorn3
Copy link
Member

bjorn3 commented Jan 2, 2024

In any case having a helper function for doing the saturating equivalent of ireduce as is done here would be nice to have. It can probably go in num.rs or cast.rs.

@Noratrieb
Copy link
Member Author

I created #1443 to restructure all the packed code.

@Noratrieb
Copy link
Member Author

closing in favor of #1443

@Noratrieb Noratrieb closed this Jan 3, 2024
@Noratrieb Noratrieb deleted the fix-pack-lane-offsets branch January 3, 2024 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants