Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure x86 signed pack instructions #1443

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

Noratrieb
Copy link
Member

@Noratrieb Noratrieb commented Jan 2, 2024

This reduces the amount of duplicated code and the chance for bugs.

Sorta replaces #1442. I'm not very happy with how it turned out, but it looks mostly decent I think.

Also adds new support for _mm_packs_epi16, _mm256_packs_epi16 and _mm256_packus_epi32.

For reference, here's a table of all the instructions:

sse 16 avx 16 sse 32 avx 32
unsigned _mm_packus_epi16|llvm.x86.sse2.packuswb.128 _mm256_packus_epi16|llvm.x86.avx2.packuswb _mm_packus_epi32|llvm.x86.sse41.packusdw _mm256_packus_epi32|llvm.x86.avx2.packusdw
signed _mm_packs_epi16|llvm.x86.sse2.packsswb.128 _mm256_packs_epi16|llvm.x86.avx2.packsswb _mm_packs_epi32|llvm.x86.sse2.packssdw.128 _mm256_packs_epi32|llvm.x86.avx2.packssdw

src/intrinsics/llvm_x86.rs Outdated Show resolved Hide resolved
This reduces the amount of duplicated code and the chance for bugs.

I validated the new code for correctness against LLVM using the
following script. It found many bugs in the implementation until I was
finally able to get it correct and passing.

```rust
//! Test for x86 pack instructions. Prints deterministic results, use it to compare backends.
use std::arch::x86_64::{self, __m128i, __m256i};
use rand::{rngs::SmallRng, Rng, SeedableRng};
fn main() {
    let rng = &mut SmallRng::seed_from_u64(123);
    for _ in 0..100_000 {
        unsafe {
            sse_test(rng);
            avx_test(rng);
        }
    }
}
unsafe fn sse_test(rng: &mut SmallRng) {
    print_sse_8(x86_64::_mm_packus_epi16(sse16(rng), sse16(rng)));
    print_sse_8(x86_64::_mm_packs_epi16(sse16(rng), sse16(rng)));
    print_sse_16(x86_64::_mm_packus_epi32(sse32(rng), sse32(rng)));
    print_sse_16(x86_64::_mm_packs_epi32(sse32(rng), sse32(rng)));
}
unsafe fn avx_test(rng: &mut SmallRng) {
    print_avx_8(x86_64::_mm256_packs_epi16(avx16(rng), avx16(rng)));
    print_avx_8(x86_64::_mm256_packs_epi16(avx16(rng), avx16(rng)));
    print_avx_16(x86_64::_mm256_packus_epi32(avx32(rng), avx32(rng)));
    print_avx_16(x86_64::_mm256_packs_epi32(avx32(rng), avx32(rng)));
}
fn print_sse_8(t: __m128i) {
    let ints = unsafe { std::mem::transmute::<_, [i8; 16]>(t) };
    println!("{ints:?}");
}
fn print_sse_16(t: __m128i) {
    let ints = unsafe { std::mem::transmute::<_, [i16; 8]>(t) };
    println!("{ints:?}");
}
fn print_avx_8(t: __m256i) {
    let ints = unsafe { std::mem::transmute::<_, [i8; 32]>(t) };
    println!("{ints:?}");
}
fn print_avx_16(t: __m256i) {
    let ints = unsafe { std::mem::transmute::<_, [i16; 16]>(t) };
    println!("{ints:?}");
}
fn sse16(rand: &mut SmallRng) -> __m128i {
    unsafe { std::mem::transmute([(); 8].map(|()| i16(rand))) }
}
fn sse32(rand: &mut SmallRng) -> __m128i {
    unsafe { std::mem::transmute([(); 4].map(|()| i32(rand))) }
}
fn avx16(rand: &mut SmallRng) -> __m256i {
    unsafe { std::mem::transmute([(); 16].map(|()| i16(rand))) }
}
fn avx32(rand: &mut SmallRng) -> __m256i {
    unsafe { std::mem::transmute([(); 8].map(|()| i32(rand))) }
}
fn i16(rand: &mut SmallRng) -> i16 {
    if rand.gen() {
        rand.gen::<i16>()
    } else {
        rand.gen::<i8>() as i16
    }
}
fn i32(rand: &mut SmallRng) -> i32 {
    if rand.gen() {
        rand.gen::<i32>()
    } else {
        rand.gen::<i16>() as i32
    }
}
```
@bjorn3 bjorn3 added the A-core-arch Area: Necessary for full core::arch support label Jan 3, 2024
@bjorn3 bjorn3 merged commit b3b36e9 into rust-lang:master Jan 3, 2024
16 of 17 checks passed
@bjorn3
Copy link
Member

bjorn3 commented Jan 3, 2024

Thanks!

@Noratrieb Noratrieb deleted the x86-signed-pack branch January 3, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core-arch Area: Necessary for full core::arch support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants