Stable SIMD in Rust #2325

alexcrichton · 2018-02-07T16:56:25Z

The purpose of this RFC is to provide a framework for SIMD to be used on stable
Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the
scaffolding for other platforms as well as a future portable SIMD design (to be
fleshed out in another RFC).

Rendered

The purpose of this RFC is to provide a framework for SIMD to be used on stable Rust. It proposes stabilizing x86-specific vendor intrinsics, but includes the scaffolding for other platforms as well as a future portable SIMD design (to be fleshed out in another RFC).

aturon · 2018-02-07T17:09:39Z

Awesome work, @alexcrichton! And thanks go to @BurntSushi and @gnzlbg for a ton of work as well.

One minor question: it's implied, but not really stated (that I saw) that the vendor module is meant to encompass all architecture intrinsics, not just SIMD-related ones. Is that right? I wonder if arch might be more clear.

steveklabnik · 2018-02-07T17:11:56Z

Seconded, this is extremely exciting!

I wonder if arch might be more clear.

Please excuse the possibly extremely under-informed question here: are these intrinsics inherent to the platform, or to the company? That is, are there Intel intrinsics that AMD doesn't offer, and vice-versa? If so, then vendor seems appropriate. If not, then arch does.

My current understanding is that this is vendor-specific, but I could be wrong. The above is how I'd think about it, though.

gnzlbg · 2018-02-07T17:12:22Z

That is, are there Intel intrinsics that AMD doesn't offer, and vice-versa?

Yes, there are. Some examples would be SSE4a, ABM, and TBM. These intrinsics are exposed by stdsimd, are available on some AMD CPUs, but not on any Intel ones.

sfackler · 2018-02-07T17:22:58Z

text/0000-stable-simd.md

+```
+
+When [inspecting the assembly][asm1] you notice that rustc is making use of the
+`%xmmN` registers which you've read is related to SSE on your CPU. You know,


Nit: xmm or ymm?

xmm. ymm are 256 bit registers, and in this particular example, there are no target features enabled, so on x86_64 it will be limited to SSE2 instructions, which use 128 bit registers (xmm).

The example below enables the avx2 feature, which permits AVX2 instructions, which use 256 bit registers (ymm).

steveklabnik · 2018-02-07T17:21:03Z

text/0000-stable-simd.md

+present on one platform may not be present on another.
+
+The contents of the `vendor` modules are defined by, well, vendors! For example
+Intel has an [intrinsics guide][intr-guide] which will serve as a guideline for


link target is missing here

steveklabnik · 2018-02-07T17:21:13Z

text/0000-stable-simd.md

+
+For example most Intel intrinsics start with `_mm_` or `_mm256_` for 128 and
+256-bit registers. While perhaps unergonomic, we'll be sticking to what Intel
+says. Note that all intrinsics will also be `unsafe`, according to [RFC 2045].


link target is missing here

steveklabnik · 2018-02-07T17:22:42Z

text/0000-stable-simd.md

+
+There are a number of intrinsics on x86 (and other) platforms that require their
+arguments to be constants rather than decided at runtime. For example
+[`_mm_insert_pi16`] requires its third argument to be a constant value where


this isn't linked, i'm not sure if it was intended to

steveklabnik · 2018-02-07T17:23:37Z

text/0000-stable-simd.md

+
+Over the years quite a few iterations have happened for SIMD in Rust. This RFC
+draws from as many of those as it can and attempts to strike a balance between
+exposing functionality whiel still allowing us to implement everything in a


typo: while

sfackler · 2018-02-07T17:30:35Z

text/0000-stable-simd.md

+SIMD arguments are passed across boundaries and whatnot.
+
+Again though, note that this section is largely an implementation detail of SIMD
+in Rust today, though it's enabling the usage Effortsfectively without a lot of


alexcrichton · 2018-02-07T17:30:47Z

@aturon and @steveklabnik excellent questions! I think @steveklabnik's point though hit the nail on the head in that I think we should rename to arch. To clarify though, @aturon you're correct in that this module is intended encompass all vendor intrinsics, not just those related to SIMD. (SIMD is just the banner feature to catch everyone's attention!)

@steveklabnik you're also right though in that, for example, our x86 and x86_64 target_arch targets actually have two vendors, Intel and AMD. As @gnzlbg pointed out there's AMD-specific intrinsics and I'm sure there's Intel-specific intrinsics as well. The intention is that this module we're adding to libstd would contain everything (both Intel and AMD) and you'd do runtime/compile time dispatch to figure out which ones you can call.

I think that pushes me towards arch as the entire module is conditional over purely target_arch I believe, and not actually the vendors (with multiple vendors in there). @gnzlbg what do you think?

steveklabnik · 2018-02-07T17:33:01Z

The intention is that this module we're adding to libstd would contain everything (both Intel and AMD) and you'd do runtime/compile time dispatch to figure out which ones you can call.

arch sounds great to me, then.

fbstj · 2018-02-07T17:36:02Z

text/0000-stable-simd.md

+[rfc2212]: https://github.com/rust-lang/rfcs/pull/2212
+
+```rust
+pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {


is this function supposed to be still unsafe?

gbutler69 · 2018-02-07T17:36:32Z

Why would this function be "unsafe" (as marked)?

pub unsafe fn foo(a: &[u8], b: &[u8], c: &mut [u8]) {
    // Note that this `unsafe` block is safe because we're testing
    // that the `avx2` feature is indeed available on our CPU.
    if cfg_feature_enabled!("avx2") {
        unsafe { foo_avx2(a, b, c) }
    } else {
        foo_fallback(a, b, c)
    }
}

Is it not the case that this function is "safe" because you've wrapped the call to an unsafe function in and "unsafe" block (with a check to ensure the contract is met) and that the "foo_fallback" would be a "safe" function (not relying on SIMD/intrinsics, but, a plain-old CPU implementation of the necessary functionality)?

alexcrichton · 2018-02-07T17:37:44Z

@gbutler69 gah oops! Looks like you and @fbstj found the same error at the same time, that was just a mistake on my part! In that section the foo function is indeed safe, it's just the AVX2-enabled one that's unsafe.

gbutler69 · 2018-02-07T17:48:44Z

Also, does:

unsafe fn foo_fallback(a: &[u8], b: &[u8], c: &mut [u8]) {
    for ((a, b), c) in a.iter().zip(b).zip(c) {
        *c = *a + *b;
    }
}

need to be marked as "unsafe"? I would think not.

alexcrichton · 2018-02-07T17:50:36Z

@gbutler69 correct! That was also erroneously tagged as unsafe, but should be fixed now!

fbstj · 2018-02-07T17:51:23Z

text/0000-stable-simd.md

+via:
+
+```rust
+#[cfg(target_feature = "avx)]


missing "

fbstj · 2018-02-07T17:51:34Z

text/0000-stable-simd.md

+    // implementation that can use `avx`
+}
+
+#[cfg(not(target_feature = "avx))]


missing "

@alexcrichton - Good to know that I'm actually starting to understand Rust code from reading these RFC's. I really appreciate all the work that those like you put into these RFC's for those of us still learning Rust.

bill-myers · 2018-02-07T17:56:41Z

I think there may be a learnability problem by putting these into libstd.

The issue is that users are not supposed to directly use those intrinsics since they are untyped (using __m128i instead of the proper one among u8x16, u16x8, u32x4, u64x2 and u128) and unnecessarily platform-dependent (you should not have to call an Intel-specific function just to get a vectorized reciprocal square root, for instance), which means they should instead use some crate from crates.io that provides a proper API.

However, by putting them into std::arch, people may be led to believe that they are supposed to use them directly.

So, I'd suggest adding a new library in the rustc repository called "intrinsics" instead of putting this into std::arch, documenting that the user is not supposed to use it directly, and ideally suggesting what crates to use instead.

alexcrichton · 2018-02-07T18:04:44Z

@bill-myers good points! It's definitely true that the main thrust of this RFC is not to empower all users of Rust to use SIMD, the apis are abysmal from that standpoint! Rather the motivation here is to empower anyone to use explicit SIMD on stable Rust. I'd expect that once we cross that threshold crates like faster are going to fill in the gap and make the experience of using SIMD much nicer. For now, though, we need to enable crates like that to build on stable!

With that mindset I think we'll definitely want to mention in the documentation quite thoroughly that these are very raw functions to use and you need to be quite careful and mindful when using them. The decision for location in the standard library, though, is done out of necessity rather than desire. All of these intrinsics are heavily coupled to a compiler backend (aka LLVM) and so there's really no way that they could be exported in a crate on crates.io where the guarantee is that such a crate would compile and work across many rustc versions. Inclusion in the standard library means that we have the freedom to continue to upgrade LLVM (and maybe even other backends one day) while providing a stable API surface area for all these intrinsics.

In other words this all leads me to the conclusions of:

We must enable these intrinsics on stable (but it's not required at this time to make them "nice")
We must place the intrinsics in the standard library (as that's "allowed to use unstable features on stable rust")

The final choice of module in the standard library I think is certainly fine to bikeshed. You're right in that something like arch is very "sweet" and sounds like you should be using it. Plus we may even want other things to go in "arch" one day! (maybe).

Do you have thoughts though on what a better name for such a module would be?

eddyb · 2018-02-07T18:07:49Z

How about std::arch::intrinsics?

EDIT: bonus: std::arch could have various constants (or even types) about the hardware platform.

fbstj · 2018-02-07T18:08:32Z

text/0000-stable-simd.md

+Despite the CI infrastructure of the `stdsimd` crate it seems inevitable that
+we'll get an intrinsic wrong at some point. What do we do in a situation like
+that? This situation is somewhat analagous to the `libc` crate but there you can
+fix the problem downstream (just have a corrected type/definition) for for


double "for"

Lokathor · 2018-02-07T18:21:08Z

It would, however, be an error to write this on x86 cpus: cfg_feature_enabled!("neon"); //~ ERROR: neon is an ARM feature, not x86

Compile time error, correct?

alexcrichton · 2018-02-13T20:37:19Z

Thinking now the current "portability speed bump" in the standard library is something like std::os::unix where it has "unix" in the name to let you know that you're using something that's specific to a particular platform. While this may not always be around it may be good to continue that in the near term (and then move later if we can). In that sense including x86 or x86_64 in the name I think may be a good idea.

That may lead to either std::arch::x86 or std::intrinsics::x86, and I'd personally be fine with either.

sfackler · 2018-02-13T20:37:37Z

I think it'd be fine to deprecate but leave those couple of stable functions in place if we want to coopt the intrinsics module, though.

hanna-kruppe · 2018-02-13T20:38:21Z

Known (sad) bug, I think it's been bugging @petrochenkov for years now.

While the bug is sad, in this case it doesn't really bother me. std::intrinsics is a natural module name and a natural location for these intrinsics. Putting them in std::intrinsics::rustc would be no better, arguably worse. IMO we should have stabilized the module ages ago.

Lokathor · 2018-02-13T23:16:32Z

People keep linking to my post but they maybe don't seem to be using the same take-away that I intended, so I'll try to make that more clear:

Going by arch isn't detailed enough, but calling them all intrinsics is even less detailed than that.
We can name the module any dumb name at the top level, but we should then also have sub-modules for each major feature set. This is basically how the Intel Intrinsics Guide actually presents the docs to people, for example.
Bonus For The Future: if we ever present llvm intrinsics to the rust level we can put them under std::intrinsics::llvm51 and so forth according to the llvm version, and the [un]stable guarantee can be "this module is linked to the llvm version 5.1 intrinsics", the same as simd features are linked to do exactly whatever Intel (or whatever manufacturer) says.

eddyb · 2018-02-13T23:24:03Z

if we ever present llvm intrinsics to the rust level

Initial reaction is "please no", but at least having the version in there seems very sensible.

alexcrichton · 2018-02-14T00:15:10Z

I don't personally think we should have something like std::...::sse2 for all the various target_feature directives. I believe this isn't how C is organized (everything's in one namespace, albeit different header files) and it places too much onus on us to figure out how to organize everything. Dumping everything into one namespace is what vendors (aka Intel) currently expect, so I think it's what we should do as well.

I also agree with @eddyb, let's not go crazy and expose LLVM intrinsics, this RFC is purely for SIMD/vendor intrinsics.

Lokathor · 2018-02-14T01:27:52Z

The LLVM comment at the end was pure speculation, I'm not trying to double the scope of the RFC. However, when it comes to the stability conversation I think it's best to keep in mind not just the next 6 months, but the next 60 months, and hopefully the next 600 months even.

@alexcrichton You say there's some sort of "organization onus" that's put on us if we break it up by target feature, but I don't understand what that onus would be. Every intrinsic that exists already exists in connection with a specific target feature. The entire RFC is predicated on the idea of these features. Every single intrinsic call is cfg-attribute gated so that it can't exist in a build targeting the wrong CPU and/or feature set. All of the organization is done for us by that alone.

Further, if you check the Intrinsics Guide you will note that each intrinsic feature set actually is exported by a different header. The fact that there's also a header to "grab everything" means that we just need a re-export module for folks who really do want it all dumped on them at once. However, the functionality itself would still live in different sub-modules.

I'm all for modules being "however big they need to be", but if every intrinsics now and forever lives in a single module it's just too much. The RFC itself already quotes the count at "thousands", and admits that there's so much code we might have to relax the normal stability guarantee and fix a bug here and there simply because there's so much code we don't fully trust ourselves to get it all correct the first time out.

alexcrichton · 2018-02-14T01:43:25Z

@Lokathor the organization onus is that we are the ones deciding to do so. Vendors are not designing intrinsics to be placed in modules (so it seems) but rather to get #include'd in C where everything is in one namespace. How would we assign modules to intrinsics that require multiple cpu features? What if Intel doesn't list a CPU feature for an intrinsic? Do we make our best guess?

Again the intention of this RFC is to expose anything at all on stable, not make it a 100% high quality interface for general consumption. It's expected something like that will show up on crates.io, not in libstd itself.

To that end we're looking for the lowest possible overhead to add these sorts of intrinsics to the standard library. Placing them in one module is not only easy for us to do but it also matches what vendors are expecting. This also should extend naturally to future architectures as well which aren't currently bound in stdsid.

gnzlbg · 2018-02-14T08:45:46Z

I don't like the separation in "feature" submodules either. For x86 it is already problematic due to some intrinsics requiring multiple features. Also, whether an intrinsic is available might depend on whether you are targeting a 32-bit or 64-bit x86 platform, and some other platform characteristics might apply as well (e.g. i586 vs i686). So getting this right is a lot of work, and after doing this work we really don't have any reliably way of knowing whether what we did is correct.

And x86 is the nice platform. For ARM we have "some" level of separation in stdsimd, but with arm vs armv7 vs aarch64 vs +/-thumb-mode vs +/- hard-float vs ... I don't even know if there is a separation into submodules that actually makes sense. We have tried to do so in stdsimd to reduce the amount of cfg macros we need internally, but while in x86 this more or less works, for ARM it is still a big unknown.

@alexcrichton one thing we could do is just expose the same headers that C does in submodules, for example:

std::arch::immintrin: for Intel Multi Media Intrinsics, equivalent to #include <immintrin.h>
std::arch::ammintrin: for AMD Multi Media Intrinsics, equivalent to #include <ammintrin.h>
std::arch::cpuid: for x86 cpuid intrinsics
... etc, we mirror the publich headers exposed by gcc and clang, see here: https://github.com/llvm-mirror/clang/tree/master/lib/Headers (note that most of those headers are not public, e.g., all the avx headers can only be included via #include <immintrin.h> and not directly).

While we could do this, I find the reasons to do so weak at best. In a platform where an x86 intrinsic is not available, users will get a compiler error if they try to import the intrinsic. In a platform where the intrinsic is available, whether the intrinsic can or cannot be called depends on the machine where the binary will run on. Users that want to ensure portability should be using cfg-macros any ways to do either compile-time or run-time feature detection. Something like the portability RFC is a better solution than submodules here, in particular when submodule names can be misleading.

@alexcrichton it was mentioned in the internal threads that instead of using attributes + functions, const generics, etc. to declare intrinsics that take compile-time constants as arguments, we could just be using macros. With macros 2.0 users should be able to import these macros from std::arch, and if we ever get language support for compile-time constant function arguments, functions with the same name can be added alongside the macros and the macros deprecated. This alternative should maybe be mentioned in the RFC.

alexcrichton · 2018-02-14T15:40:15Z

@gnzlbg yeah I think organizing by header is possible but I don't think it buys us much. Sort of like libc I don't think we should organize by header but rather dump everything into the same namespace (which is what happens in C).

It's true yeah we could use macros (that's what C does I believe), but I'd personally prefer to stick to the solution we've got so far which with enough const machinery I think could one day be stable.

rfcbot · 2018-02-14T17:48:45Z

🔔 This is now entering its final comment period, as per the review above. 🔔

parched · 2018-02-14T23:43:54Z

text/0000-stable-simd.md

+The `#[cfg]` attribute and `cfg!` macro statically resolve and **do not do
+runtime dispatch**. Tweaking these functions is currently done via the `-C
+target-feature` flag to the compiler. This flag to the compiler accepts a
+similar set of strings to the ones specified above and is already "stable".


What about #[target_feature]? Does this RFC define that it doesn't affect #[cfg] or does it leave it undecided? CC rust-lang/rust#42515

I believe that's going to continue to be an open bug

JulianBirch · 2018-02-19T16:01:43Z

This is a minor point, and I'm not even a Rust developer so this may be way off base, but I'm wondering whether you'd be better off having specific types for "Intel defined types". So void * is always voidstar and long long is always longlong rather than remembering which native type is mapped.

rfcbot · 2018-02-24T17:54:14Z

The final comment period is now complete.

alexcrichton · 2018-02-26T21:11:44Z

Ok! Now that FCP has elapsed it looks like nothing major has come up. I think there's still an open question as to where to place this in the standard library, but I'm going to merge this and we can of course continue to bikeshed on the tracking issue!

alexcrichton self-assigned this Feb 7, 2018

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Feb 7, 2018

sfackler reviewed Feb 7, 2018

View reviewed changes

steveklabnik reviewed Feb 7, 2018

View reviewed changes

sfackler reviewed Feb 7, 2018

View reviewed changes

typos

e8312d1

fbstj reviewed Feb 7, 2018

View reviewed changes

Correct errant unsafety

a65759e

alexcrichton added 3 commits February 7, 2018 09:40

Tweak wording

f1bddc3

Fix the third play link

7fe2ee3

Even better third link!

73e7343

fbstj reviewed Feb 7, 2018

View reviewed changes

text/0000-stable-simd.md Outdated

via:

```rust

#[cfg(target_feature = "avx)]

Copy link

Contributor

fbstj Feb 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "

fbstj reviewed Feb 7, 2018

View reviewed changes

More typos!

1218615

fbstj reviewed Feb 7, 2018

View reviewed changes

typo

e8cf49b

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Feb 14, 2018

parched reviewed Feb 14, 2018

View reviewed changes

This was referenced Feb 24, 2018

C like SIMD intrinsics for Rust #1639

Closed

SIMD Support #280

Closed

alexcrichton mentioned this pull request Feb 26, 2018

Tracking issue for stable SIMD in Rust rust-lang/rust#48556

Closed

3 tasks

RFC 2325

7ce3862

alexcrichton merged commit 27f4519 into rust-lang:master Feb 26, 2018

egilburg mentioned this pull request Feb 26, 2018

Does Rayon eventually plan to auto-vectorize code to SIMD? rayon-rs/rayon#546

Closed

Centril added A-simd SIMD related proposals & ideas A-cfg Conditional compilation related proposals & ideas A-attributes Proposals relating to attributes labels Nov 23, 2018

Centril mentioned this pull request Nov 27, 2018

Stabilize memory-releated std::arch::wasm32 intrinsics rust-lang/rust#56292

Closed

7 tasks

Centril mentioned this pull request Apr 17, 2019

Tracking issue for the #[alloc_error_handler] attribute (for no_std + liballoc) rust-lang/rust#51540

Open

2 tasks

shenanigans77-jpg approved these changes Jun 1, 2022

View reviewed changes

github-actions bot mentioned this pull request Apr 9, 2024

rust/1.77.2 package update wolfi-dev/os#16567

Merged

Stable SIMD in Rust #2325

Stable SIMD in Rust #2325

Conversation

alexcrichton commented Feb 7, 2018 • edited Loading

aturon commented Feb 7, 2018

steveklabnik commented Feb 7, 2018 • edited Loading

gnzlbg commented Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Feb 7, 2018

steveklabnik commented Feb 7, 2018

Choose a reason for hiding this comment

gbutler69 commented Feb 7, 2018

alexcrichton commented Feb 7, 2018

gbutler69 commented Feb 7, 2018

alexcrichton commented Feb 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bill-myers commented Feb 7, 2018

alexcrichton commented Feb 7, 2018 • edited Loading

eddyb commented Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Lokathor commented Feb 7, 2018

alexcrichton commented Feb 13, 2018

sfackler commented Feb 13, 2018

hanna-kruppe commented Feb 13, 2018

Lokathor commented Feb 13, 2018

eddyb commented Feb 13, 2018

alexcrichton commented Feb 14, 2018

Lokathor commented Feb 14, 2018

alexcrichton commented Feb 14, 2018

gnzlbg commented Feb 14, 2018 • edited Loading

alexcrichton commented Feb 14, 2018

rfcbot commented Feb 14, 2018

parched Feb 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JulianBirch commented Feb 19, 2018 • edited Loading

rfcbot commented Feb 24, 2018

alexcrichton commented Feb 26, 2018

alexcrichton commented Feb 7, 2018 •

edited

Loading

steveklabnik commented Feb 7, 2018 •

edited

Loading

gnzlbg commented Feb 7, 2018 •

edited

Loading

alexcrichton commented Feb 7, 2018 •

edited

Loading

eddyb commented Feb 7, 2018 •

edited

Loading

gnzlbg commented Feb 14, 2018 •

edited

Loading

parched Feb 14, 2018 •

edited

Loading

JulianBirch commented Feb 19, 2018 •

edited

Loading