Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr(simd) is unsound #44367

Closed
gnzlbg opened this issue Sep 6, 2017 · 68 comments · Fixed by #47743
Closed

repr(simd) is unsound #44367

gnzlbg opened this issue Sep 6, 2017 · 68 comments · Fixed by #47743
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) B-unstable Blocker: Implemented in the nightly compiler and unstable. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 6, 2017

The following should be discussed as part of an RFC for supporting portable vector types (repr(simd)) but the current behavior is unsound (playground):

#![feature(repr_simd)]
#![feature(target_feature)]
#![allow(non_camel_case_types)]

// Given a SIMD vector type:
#[derive(Debug)]
#[repr(simd)]
struct f32x8(f32, f32, f32, f32, 
             f32, f32, f32, f32);

// and the following two functions:

#[target_feature = "+avx"]
fn foo() -> f32x8 { f32x8(0.,1.,2.,3.,4.,5.,6.,7.) }  // f32x8 will be a 256bit vector

#[target_feature = "+sse3"]
fn bar(arg: f32x8) {  // f32x8 will be 2x128bit vectors
    println!("{:?} != f32x8(0, 1, 2, 3, 4, 5, 6, 7)", arg);
    // prints: f32x8(0, 1, 2, 3, 6, 0, 0, 0) != f32x8(0, 1, 2, 3, 4, 5, 6, 7)
}

// what are the semantics of the following when
// executing on a machine that supports AVX?
fn main() { bar(foo()); }

Basically, those two objects of type f32x8 have a different layout, so foo and bar have a different ABI / calling convention. This can be introduced without target_feature, by compiling two crates with different --target-cpus and linking them, but target_feature was used here for simplicity.

@alexcrichton
Copy link
Member

alexcrichton commented Sep 6, 2017

Two thoughts I've had in the past how to fix this:

  • We can forbid this in trans I think by just making it a hard error. Basically when this happens give up and say "it's still your problem to fix it"
  • The compiler could automatically generate a shim to "do the right thing". I explained this a long time ago as well but the general idea is that &T always has the same ABI regardless of target_feature, and the compiler could abuse that.

Given that #[target_feature] is unsafe though we could also just add it to the list of contracts you have to uphold to call the function. In that it's just one more checkbox to check when working with the unsafe functions, ensuing that you call them with the same target_feature set if you pass SIMD arguments.


example of unsafety in C++

example in Rust

@parched
Copy link
Contributor

parched commented Sep 7, 2017

It's worth noting that, when talking about LLVM features (which rust target features currently map directly to) this also affects floats (which are stable). I.e., on x86 with current safe rust if you compile one crate with --target-feature=+soft-float" and one without you have an issue. This can also be solved as Alex mentions though.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 7, 2017

@alexcrichton I would prefer to start with a hard error, and if we need it, add a way to opt-in to the shim generation (*).

The only thing that concerns me about the hard error, is that we will probably need to emit this during monomorphization. I think that this is not acceptable, and we should only do this if it's either temporary or there is no other way. @eddyb pointed out that currently stable rust has no monomorphization-time errors, so this solution might block stabilization.

@alexcrichton you mentioned that this would mean that SIMD types must be then banned from FFI because we don't know the calling convention of the caller. Could you elaborate on this? As I see it, FFI is already unsafe, so it would be up-to-the-user to make sure that the callee is using the appropriate calling convention.

(*) I haven't thought this through, but I imagine getting a hard error for a particular call site, and then wanting to opt-in for that particular call site only, into the shim generation. Anyways, we don't need to think this all the way through now.

@alexcrichton
Copy link
Member

I'd personally be totally ok with a hard error, but yes I think we'd have to do it during monomorphization. It's true that we don't have many monomorphization errors today but I don't think we have absolutely 0, and I'd personally also think that we should at least get to a workable state and try it out to evaluate before possibly stabilization. I, personally again, would be fine mostly likely stabilizing with monomorphization errors.

@alexcrichton you mentioned that this would mean that SIMD types must be then banned from FFI because we don't know the calling convention of the caller. Could you elaborate on this?

Oh right yeah! Right now we've got a lint in the compiler for "this type is unsafe in FFI", and for example it lints about bare structs that are not #[repr(C)] and things like String. We'd just want to make sure the lint warned about all SIMD structures as well (that they're not accidentally considered allow-by-default).

@parched I don't believe we're considering a #[target_feature] attribute for soft-float and the #[target_feature] is a whitelisted list of attributes which today go straight to LLVM but we aren't bound to always doing so. I'd imagine this would be an excellent topic of discussion were we to stabilize such a target feature!

@eddyb
Copy link
Member

eddyb commented Sep 7, 2017

@alexcrichton I've recently reviewed all non-ICE errors from librustc_trans and the only one that can be hit from stable is monomorphization recursion limit. Anything else would be a bug IMO.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 7, 2017

While +soft-float may not be considered a valid argument for #[target_feature], disabling SSE does affect the ABI of floats. That is, this call would require a shim as well:

pub fn bar() {
    foo(0.0);
}

#[target_feature = "-sse,-sse2"]
pub fn foo(x: f32) -> f32 {
    x + x
}

The above code actually crashes in LLVM on the playground, because that's x86_64 where SSE2 is always present. However, on a 32 bit x86 target, bar passes the argument in xmm0 while foo loads its argument from the stack.

@parched
Copy link
Contributor

parched commented Sep 7, 2017

How about we make the target spec define the ABI regardless of what extra features are enabled or disabled by attributes or the commandline. That would avoid the need for shims and monomorphization errors. So for example on x86_64 which has 128-bit vectors by default:

  • -sse would be a hard error on the commandline or as an attribute on any function that has 128bit (and float?) parameters
  • 256-bit vectors would be passed on the stack regardless of whether +avx is enabled. I don't think there would be a performance issue with this, because anywhere it mattered should be inlined.
  • If some functions require an ABI where 256-bit registers are used, e.g., some intrinsics probably. Then we add another ABI to explicitly tag functions with e.g., extern "vector-256". Calling one of these from a function without +avx attribute would be a hard error regardless of whether +avx was enabled for the whole crate via the commandline.

@alexcrichton alexcrichton added A-simd Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 7, 2017
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 8, 2017

@rkruppe Thanks for that example. Two small comments:

  • Solving this particular problem is not a blocker for allowing SIMD (and friends) on stable Rust. The target_feature RFC does not allow disabling features. To introduce this problem one would need to do so at the -C target-feature level, and link two crates one with +sse and one with -sse (and this is a problem we already have, with basically any feature).

  • The most immediate goal is allowing SIMD on stable Rust. Ideally, whatever solution we implement first will solve this problem, but a solution that does not fix it can still be good enough as long as it doesn't prevent us from fixing this in the future.

I personally wouldn't like to have to re-open this topic in the future when people start filling bugs due to random segfaults because some crate in the middle of their dependency graph decided that it was a good idea to add -sse in their build.rs script... So, I think that we should try hard to come up with a solution that fixes all these ABI issues once and for all.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 8, 2017

@parched some x86_64 targets already have 512bit vectors. What do we do when AVX3 or 4 are released with support for 1024bit vectors? Where does that path end?

Ideally, if I have an SSE dynamic library that exposes some functions on its ABI for SSE...AVX2, I would like to be able to add some new AVX3/4 functions to its interface, recompile, and produce a library that is ABI compatible with the old one, so that all my old clients can continue to work as is by linking to the new library, but newer code is able to call the new AVX3/4 functions. That is, adding those new AVX3/4 functions should not break the ABI of my dynamic library as long as my global target is still sse.

@parched
Copy link
Contributor

parched commented Sep 8, 2017

@parched some x86_64 targets already have 512bit vectors. What do we do when AVX3 or 4 are released with support for 1024bit vectors? Where does that path end?

@gnzlbg yes 512-bit and 1024-bit vectors would have to be treated the same way but I don't believe adding more would be an issue.

Ideally, if I have an SSE dynamic library that exposes some functions on its ABI for SSE...AVX2, I would like to be able to add some new AVX3/4 functions to its interface, recompile, and produce a library that is ABI compatible with the old one, so that all my old clients can continue to work as is by linking to the new library, but newer code is able to call the new AVX3/4 functions. That is, adding those new AVX3/4 functions should not break the ABI of my dynamic library as long as my global target is still sse.

For that case you would just have to make your new functions extern "vector-1024" and users wouldn't be allowed to call it unless their caller had #[target_feature = "+avx4"] . (EDIT: To be clear, you would only have to do this if your new function had 1024-bit parameters and you wanted to force them to be passed in registers rather than the stack. If you just used the default extern "Rust" 1024-bit vectors would be passed on the stack and so code would be allowed to call this function)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 8, 2017

@parched I think I misunderstood your comment then.

Then we add another ABI to explicitly tag functions with e.g., extern "vector-256".

Do you think that #[target_feature = "+avx"] could do this automatically?


@alexcrichton @BurntSushi I've slept over this a bit, and I think the following is a common idiom that we need to enable:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      foo_avx(v)  // mismatched ABI: hard error (argument)
      // mismatched ABI: hard error (return type)
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

#[target_feature = "avx"]
fn foo_avx(arg: f32x8) -> f32x8 { ... }

Here we have some mismatching ABIs. I am still fine with making these mismatching ABIs hard errors as long as there is an opt-in way to make this idiom work. What do you think about using as to cast between ABIs ?:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      // foo_avx(v) // ERROR: mismatched ABIs (2x arg and ret type)
      // foo_avx(v as f32x8) // ERROR: mismatched ABIs (1x ret type)
      foo_avx(v as f32x8) as f32x8 // OK
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

That is, an as cast to f32x8 inserts the shims that @alexcrichton was proposing above to make this work. I thought about using as #[target_feature = "sse"] f32x8 or similar but that seemed unnecessarily verbose. I also thought about making it just work, but the shims do introduce a cost, so I think it is better to make this cost explicit.

Do you think we can extend this to make function pointers work?:

#[target_feature = "+sse"] fn foo(f32x8) -> f32x8;
static mut foo_ptr: fn(f32x8) -> f32x8 = foo;

unsafe {
  // foo_ptr = foo_avx; // ERROR: mismatched ABI
  foo_ptr = foo_avx as fn(f32x8) -> f32x8; // OK
}

// assert_eq!(foo_ptr, foo_avx); // ERROR: mismatched ABIs
assert_eq!(foo_ptr, foo_avx as fn(f32x8) -> f32x8); //  OK

I was thinking that in this case, foo_avx would need to be wrapped into a function that inserts the ABI conversion shims. I was worried that doing this would introduce other issues: foo_ptr = foo_avx; assert_eq(foo_ptr, foo_avx); // FAILS because the address of foo_ptr and the address of foo_avx, but this will be an ABI mismatch error that won't compile, and doing the cast would compare the address of the wrapped function which would return OK.

I think that pursuing this would require us to track the ABI of repr(simd) types alongside their type, that is, f32x8["avx"] != f32x8["sse"] but f32x8["sse"] == f32x8["sse3"] (because their ABIs are identical. Tracking the ABI of repr(simd) types alongside the type looks like a lot of implementation work. @eddyb Do you think that "something like this" could allow us to lift the monomorphization-time errors to type-checking?

I think that if we can lift these errors to type-checking:

  • that would fix this issue, because the unsound cases become type errors
  • it would allow users to opt into an extra run-time cost to gain some flexibility (passing types across ABIs)
  • it would fix @rkruppe's example because -sse (and soft-float) would effectively change the types of the floats, producing type errors
  • it allows making function pointers work for different ABIs (and function pointers are a common idiom for dispatching to different SIMD implementations).

Thoughts?


EDIT: even if we never stabilize repr(simd) and only expose some f32x8 like types in std, this is slowly turning into RFC material...


EDIT2: That is, this issue would be resolved by making the original example fail with a type error, and adding the as conversions (or similar) would need to be a subsequent step.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 8, 2017

@gnzlbg

The target_feature RFC does not allow disabling features. To introduce this problem one would need to do so at the -C target-feature level, and link two crates one with +sse and one with -sse

For the record, that's not true, one just needs a target that doesn't have SSE enabled by default (or defaults to soft-float), such as the (tier 2) i586-* targets.

I do agree that we should find a proper solution right now, especially since the "cheap fixes" that I'm aware of (monomorphization-time error, or strongarm the ABIs into being compatible by explicitly passing problematic types on the stack) permit code that probably wouldn't work unmodified under a more principled solution. Unfortunately I don't have the time to dive into solutions right now, so I can't contribute anything but nagging at the moment :trollface:

@alexcrichton
Copy link
Member

@gnzlbg specifically you think that passing arguments like f32x8 is common enough to warrant ABI compatibility in one way or another? Presumably this doesn't happen at all in C because of the same ABI problem, right?

I'm also not sure we can ever get function pointers to "truly work" unless we declare "one true ABI" for these types, otherwise we have no idea what the actual abi of the function pointer is.

(this bit about function pointers is pushing me quite a bit into the camp of "just declare everything unsafe and document why")

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 8, 2017

Presumably this doesn't happen at all in C because of the same ABI problem, right?

Of course it happens, see this SO question, but users get warnings and undefined behavior pretty quickly and learn to work around this (that is, "don't do that", pass a float* and a size, manually write the shims you proposed using assembly, etc.).

An important point is that this can only happen when you have ABI incompatible vector types. That is, if you are using from SSE to SSE4.2, then you never run into these issues, because they are introduced by AVX which is relatively recent, and by AVX512 which is very rare (EDIT: on ARM you only have NEON so this does not happen, and the new SVE completely works around this issue).

I'm also not sure we can ever get function pointers to "truly work" unless we declare "one true ABI" for these types, otherwise we have no idea what the actual abi of the function pointer is.

Why do we need one true ABI for these types? For example:

#[target_feature = "+sse"]
fn foo() {
  let a: fn(f32x8) -> f32x8;  // has type fn(f32x8["sse"]) -> f32x8["sse"]
}

#[target_feature = "+avx"]
fn bar() {
  let a: fn(f32x8) -> f32x8;  // has type fn(f32x8["avx"]) -> f32x8["avx"]
}

static a: fn(f32x8) -> f32x8;  // has type fn(f32x8["CRATE"]) -> f32x8["CRATE"]
// where CRATE is replaced with whatever feature the crate is compiled with

That is, two function pointers, compiled on different crates, or functions, with different features, would just be different types and generate a type error.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 8, 2017

@alexcrichton

Another workaround in C is to do something like this:

First we need a way to merge two 128bit registers into a 256bit register (or a "no op" in SSE):

#[target_feature = "+sse"]
fn merge_sse(x: (f32x4, f32x4)) -> f32x8;  // no op?
#[target_feature = "+avx"]
fn merge_avx(x: (f32x4, f32x4)) -> f32x8; 
// ^^^^ copy 2x128bit registers to 1x256register

then we need its inverse, that is, a function that takes a 256bit value (or two in SSE) and returns 2 128 bit registers:

#[target_feature = "+sse"]
fn split_sse(f32x8) -> (f32x4, f32x4); // no op?
#[target_feature = "+avx"]
fn split_avx(f32x8) -> (f32x4, f32x4);
// ^^^^ copy the parts of a 256bit register into 2x128bit registers

then we add some macros to communicate f32x8s from AVX to SSE and vice-versa using only 128bit registers:

macro_rules! from_sse_to_avx { ($x:expr) => (merge_avx(split_sse($x)) }
macro_rules! from_avx_to_sse { ($x:expr) => (merge_sse(split_avx($x)) }
macro_rules! from_sse_to_avx_and_back { 
    ($f:expr, $x:expr) => (from_avx_to_sse!($f(from_sse_to_avx!($x)))) 
}

and then we can safely write the code above as:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      // foo_avx(v)  // mismatched ABI: hard error (argument)
      from_avx_to_sse_and_back!(foo_avx, v); // OK
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

#[target_feature = "avx"]
fn foo_avx(arg: f32x8) -> f32x8 { ... }

@parched
Copy link
Contributor

parched commented Sep 8, 2017

Then we add another ABI to explicitly tag functions with e.g., extern "vector-256".

Do you think that #[target_feature = "+avx"] could do this automatically?

It could, but if you did that you wouldn't be able to call that function from another without #[target_feature = "+avx"], i.e., your runtime dispatch function.

@alexcrichton
Copy link
Member

@gnzlbg everything you're saying seems plausible? You're thinking that passing types like f32x8 needs to work in more scenarios, and I'd naively feel like that's not true (in that idiomatic C already doesn't do it), but I'm willing to defer to you. There's a whole smorgasboard of things we can do to get this working, but I'm still personally in camp "let's call it all unsafe and call it aday"

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 9, 2017

@alexcrichton the target_feature RFC allows#[target_feature] on unsafe fns only.

"let's call it all unsafe and call it aday"

What exactly do you propose to call/make unsafe? (*)

After exploring all these options, I'd like to propose a path forward.

  • the current implementation is unsound and results in UB at run-time (we are here)
  • make the current implementation sound by producing hard errors for ABI mismatches at monomorphization time (I don't think this cuts it for stabilization)
  • make the implementation produce a hard error for ABI mismatches during type checking (this should cut it for stabilization)

@eddyb said above that stable Rust has zero monomorphization time errors. I don't think that introducing one will cut it for stabilization. To produce a type-checking error, we need to "somehow" propagate the target_feature ABI of repr(simd) types along with the types themselves. Don't we need to do this as well for producing monomorphization time errors?

Once we are there users that want to convert between incompatible ABIs can do so with this idiom, which can also be extended to make function pointers work. We could provide procedural macros in a crate that do this automatically, and if that becomes a pain point in practice we could re-evaluate adding language support for that (e.g. something along the lines of the as f32x8 solution).

(*) If I understand this correctly, we would need to require that all repr(simd) types (and types containing those types) can only be used from unsafe code, but I am probably misunderstanding something.

@alexcrichton
Copy link
Member

@gnzlbg what I mean is that yes, #[target_feature] forces a function to be unsafe. Then if you call that function you must have some other not compiler-verified mechanism to know why it's safe. When calling a #[target_feature] function then there's simply an extra guarantee you must adhere to which is that it's ABI-safe to call the function. Namely you don't pass any arguments that change ABI based on the feature or you yourself are tagged with the right #[target_feature] to have the same ABI.

I personally think that at this point it's not worth trying to push this back into typechecking. That sounds like quite a lot of work for not necessarily a lot of gain. Additionally I'd be worried that it'd expose hidden costs and/or complexities that are very difficult to get right. For example if we had:

static mut FOO: fn(u8x32) = default;

#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo;
}

#[target_feature = "+avx2"]
unsafe fn foo(a: u8x32) {
}

unsafe fn default(a: u8x32) {
}

fn main() {
    bar();
    FOO(Default::default());
}

How do we rationalize that? Does the bar function generate a shim to put a different ABI into the global? Do we give errors for some compilation flags and not others?

Note that this doesn't even start to touch on trait methods. For example how do we also rationalize Default::default? Is that tagged with the necessary feature to return u8x32? Do we generate a specialized version for each invocation?

These just seem like really difficult questions to answer and I'm not personally sold on there being any real benefit to going to all this effort to statically verify these things. All of SIMD is unsafe anyway now, so trying to add static guarantees on top of something that we've already declared as fundamentally unsafe would be nice but to me doesn't seem necessary.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 9, 2017

I'll try to explain what I am proposing better because I think we are misunderstanding each other.

First, we need to declare the repr(simd) type somewhere, for u8x32 this looks like this:

#[repr(simd)] 
struct u8x32(u8, u8, ...);

but what I am proposing is not to make repr(simd) types concrete, but make them parametric on the ABI of the current context (e.g. what target features are enabled). So even though the user writes the code above, the compiler actually treats it like this:

#[repr(simd)]
struct<ABI> u8x32(u8, u8, ...);

When the user uses a repr(simd) type position, the compiler automatically inserts the ABI type. Let's show what I mean with your example. First, the set of target features enabled for a crate sets a global type parameter CRATE_ABI like this (the user does not see any of this, is implicitly done by rustc):

type CRATE_ABI = /* from target features of the crate */; 

Now we proceed with the example. First, the user writes:

static mut FOO: fn(u8x32) = default; // OK, compiles
fn default(a: u8x32) { }

That compiles and type checks, because implicitly, the code looks like this:

static mut FOO: fn(u8x32<CRATE_ABI>) = default; // OK, compiles
fn default(a: u8x32<CRATE_ABI>) { }

So the types unify just fine. Note that default does not need to be unsafe. Users can use u8x32 in normal rust functions just fine. Some library can implements traits for SIMD types that implement safe operations, so doing linear algebra on them can be done in safe rust code (this is the end goal).

Now let's get a bit more messier. The user writes:

#[target_feature = "+avx2"] unsafe fn foo(a: u8x32) {}
#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo;
}

but what this does is the following:

#[target_feature = "+avx2"] unsafe fn foo(a: u8x32<AVX2_ABI>) {}
#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo; // OK or Error?
}

So is this code ok or is in an error? From the information provided, we cannot say. It depends on what the CRATE_ABI type is. Does foo: fn(u8x32<AVX2_ABI>) unify with FOO: fn(u8x32<CRATE_ABI>) ? Well if CRATE_ABI = AVX2_ABI the answer is yes, and this code will type check and be correct, but if it is SSE_ABI or something else, then probably not, and this code will produce a type error. On x86, NO_SSE, SSE, AVX, AVX512 might be enough.

So at this point we are ready to move to trait methods. What should this do?

fn main() {
    FOO(Default::default());
}

Well the same thing it does for any other type. It is just type-checking at work. If it can unify the type parameters then everything is ok, and otherwise, it does not compile. Obviously we need to nail the ABI types so that code only compile when it is safe, and breaks otherwise.

How do we rationalize that? Does the bar function generate a shim to put a different ABI into the global? Do we give errors for some compilation flags and not others?

I hope this has become clear, but just to be crystal clear: we never produce shims, either the ABI matches, or it doesn't. The users can manually write the shims if they need to by using this idiom.

For example how do we also rationalize Default::default?

I hope this has become clear.

All of SIMD is unsafe anyway now

I think I am misunderstanding what you mean here.

Right now, #[repr(simd)] types are safe to use, they are portable vector types after all. Some operations on them might be unsafe, but you can use them in non unsafe functions just fine.

Also, how are you exactly proposing to make repr(simd) types be unsafe to use? It sounds to me that you want to make the types themselves unsafe, but Rust does not have unsafe types. This feature has been discussed before but adding it to the language looks like an even bigger unknown to me at this point. So could you maybe expect how you think this would work?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 9, 2017

@gnzlbg So if I understand correctly, you would extend the type system in a way that would be visible to users in type mismatches? Would this also apply to float types?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 9, 2017 via email

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 9, 2017

Okay, thanks.

My impression after only very little thought is that I agree with @alexcrichton that a proper automatic solution to the ABI problems seems pretty complicated. If functions tagged with #[target_feature] are indeed always unsafe, then that seems sufficient for avoiding problems in safe code. So I'm partial to leaving it at that.

Letting unsafe code do its thing would mean that a strict solution such as the one @gnzlbg proposed can't be introduced later. However, it would still be possible to start generating shims based on the caller's and callee's set of target features.

@hanna-kruppe
Copy link
Contributor

I want to add, though: ABI mismatches are potentially very subtle and annoying bugs, so there should be a warning at least for the cases that can be easily detected statically.

@alexcrichton
Copy link
Member

@gnzlbg I think that makes sense, yeah, but I can't really comment on whether I'd think it's feasible or not. It sounds pretty complicated and subject to subtly nasty interactions, my worry would be that we'd spend all our effort chasing along tail of bugs to make this airtight. Is this really a common enough idiom to warrant the need to provide a static error instead of discovering this at runtime?

Also, how are you exactly proposing to make repr(simd) types be unsafe to use?

I haven't though too too hard about this, admittedly. If we expose APIs like Add for u8x32 then our hands may be tied in this regard, but we could also not do that and just expose unsafe fn u8x32::add(u8x32, u8x32) -> u8x32 which is less ergonomic to call but "does the right thing" in terms of ABIs.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Sep 9, 2017

@alexcrichton Why would <u8x32 as Add>::add be an issue? It's a safe method, yes, but it wouldn't have a #[target_feature] annotation and therefore match the crate-wise default ABI. The issues only arise if code that does use #[target_feature] uses that Add impl, but that's the fault of the unsafe function using #[target_feature]. A footgun to be sure, but we already have that problem with user-defined functions that handle types whose ABI is sensitive to target features. Unless you want to make any and all mention of those types in function signatures unsafe, the users of #[target_feature] will just have to suck it up and be careful (hopefully aided by a lint, as mentioned previously).

Edit: On second thought, since the functions that use #[target_feature] are already unsafe, they could already use other unsafe functions such as u8x32::add without error or even warning, negating any "linting benefit" the unsafety may have.

And of course, there's the issue that f32 and f64 already implement Add (and countless other traits, and are used by countless other std APIs) and their ABI is also affected by #[target_feature]. That makes it pretty much impossible to make usage of target_feature-sensitive types unsafe.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 18, 2017

FYI the shims are required to make Into::into() work properly:

#[target_feature = "avx2"] {
  let x = f64x4::span(...);
  let y : i64x4 = x.into();  // shims required
}

Here, since Into::into is not an avx2 function, we need to insert shims to convert from the AVX2 ABI to the SSE2 ABI, execute into which just calls From::from, and then convert back from the SSE2 ABI to the AVX2 ABI.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 18, 2017

SIMD code tends to use .into() a lot, so the answer to the question: "how often do we need shims in practice" is probably "all the time".

@glaebhoerl
Copy link
Contributor

Won't into and from get inlined basically always though? (Also not clear if the significant aspect of shims is execution time, compile time, binary size, ...?)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 18, 2017

@glaebhoerl Into::into is #[inline] and (at least without shims) very short, so the intent is that with optimizations enabled it should be inlined when profitable.

Whether From::from is inlined depends on the particular from implementation. The ones in stdsimd are #[inline(always)] but #[inline] would probably be enough once we have shims.

Also not clear if the significant aspect of shims is execution time, compile time, binary size, ...?

Adding shims adds more code which increases compile-time, the question is by how much?

Most applications I know of have only a tiny fraction of explicit SIMD code, so this might not be even measurable. Also, all applications working properly today won't probably need shims (otherwise they wouldn't be working correctly), so I wouldn't worry about this initially beyond checking that compile-times don't explode for the crates currently using SIMD.

Whether execution times will be affected is very hard to tell. Applications have little explicit SIMD code, but that code is typically in a hot spot.

As a thought experiment, we can consider the worst case: Into::into, which just calls From::from. If from is just a mem::transmute, then it costs 0 cycles. But we need to shim the argument to from and its result, so that goes from 0 to at least 2 cycles. That's an infinite blow up :D So here, if Into::into isn't inlined, or if it is but the optimizer does not remove the shims, then this would be a performance bug that would need to be fixed.

Beyond the worst case things get better though: if an application is just executing a single SIMD instruction in their hot loop, e.g., taking two arguments, and the shims aren't removed then they might go from 1 cycle to 4 cycles. And if they are doing something more complex then the cost of the shims quickly becomes irrelevant.

In all of these cases:

  • the code would be broken without the shims, so the users could not write it in the first place, and
  • if the shims turn out to be a performance problem in some applications users can change their code such that the shims are not generated (e.g. calling From::from instead of Into::into, refactoring their hot loop/functions to not use vector types with incompatible ABIs in their APIs, etc.).

If binary size/execution speed/compile time turns out to be a problem for debug builds I'd say let's worry about that when we get there.

@alexcrichton
Copy link
Member

I've been thinking about this again recently with an eye towards hoping to push SIMD over the finish line towards stabilization. Historically I've been a proponent of adding "shims" to solve this problem at compile time. These shims would cause any ABI mismatch to get resolved by transferring arguments through memory instead of registers.

As I think more and more about the shims, however, I'm coming round to the conclusion that they're overly difficult (if and not sure if possible) to implement. Especially when dealing with function pointers is where I feel like things get super tricky to do. Along those lines I've been reconsidering another implementation strategy, which is to always pass arguments via memory instead of by value.

In other words, let's say you write:

fn foo(a: u8x32) { ... }

Today we'd generated something along the lines of (LLVM-wise)

define @foo(<i8 x 32>) {
  ...
}

whereas instead what I think we should generate is:

define @foo(<i8 x 32>*) { ; note the *
  ...
}

Or in other words, SIMD values are unconditionally passed through memory between all functions. This would, I think, be much easier to implement and also jive much more nicely with the implementation of everything else in rustc today. I've historically been opposed to this approach thinking that it would be bad for performance, but when thinking about it I actually don't think there's going to be that much impact.

In general I'm under the impression that SIMD code is primarily about optimizing hot loops, and in these sorts of situations if you have a literal function call that's already killing performance anyway. In that sense we're already inlining everything enough to remove the layer of indirection by storing values on the stack. If that's true, I actually don't think that if we leave a call around that happens to take arguments through memory that it'd actually be that much of a performance loss!

AFAIK the main trickiness around this would be that Rust functions would pass all the vector types via memory, but we'd need a way to pass them by value to variuos intrinsic functions in LLVM.

In general though, what do others think about an always-memory approach?

@eddyb
Copy link
Member

eddyb commented Jan 25, 2018

AFAIK the main trickiness around this would be that Rust functions would pass all the vector types via memory, but we'd need a way to pass them by value to variuos intrinsic functions in LLVM.

The intrinsics don't have the "Rust" ABI and we'd still be passing vector types by value to C, right?

I think this approach is the easiest out of all possible ones, since all you need to change is:

match arg.layout.abi {
layout::Abi::Aggregate { .. } => {}
_ => return
}

There's two ways to do it:

  • add | layout::Abi::Vector { .. } after layout::Abi::Aggregate { .. }
    • this will cast into an integer if the size of a pointer or smaller
  • add a layout::Abi::Vector { .. } => { arg.make_indirect(); return; } arm

@alexcrichton
Copy link
Member

Thanks for the tip @eddyb, that was perfect! I've opened #47743 to close this issue.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Jan 25, 2018
This commit changes the ABI of SIMD types in the "Rust" ABI to unconditionally
be passed via pointers instead of being passed as immediates. This should fix a
longstanding issue, rust-lang#44367, where SIMD-using programs ended up showing very odd
behavior at runtime because the ABI between functions was mismatched.

As a bit of a recap, this is sort of an LLVM bug and sort of an LLVM feature
(today's behavior). LLVM will generate code for a function solely looking at the
function it's generating, including calls to other functions. Let's then say
you've got something that looks like:

```llvm
define void @foo() { ; no target features enabled
  call void @bar(<i64 x 4> zeroinitializer)
  ret void
}

define void @bar(<i64 x 4>) #0 { ; enables the AVX feature
  ...
}
```

LLVM will codegen the call to `bar` *without* using AVX registers becauase `foo`
doesn't have access to these registers. Instead it's generated with emulation
that uses two 128-bit registers. The `bar` function, on the other hand, will
expect its argument in an AVX register (as it has AVX enabled). This means we've
got a codegen problem!

Comments on rust-lang#44367 have some more contexutal information but the crux of the
issue is that if we want SIMD to work in general we'll need to ensure that
whenever a function calls another they ABI of the arguments being passed is in
agreement.

One possible solution to this would be to insert "shim functions" where whenever
a `target_feature` mismatch is detected the compiler inserts a shim function
where you pass arguments via memory to the shim and then the shim loads the
values and calls the target function (where the shim and the target have the
same target features enabled). This unfortunately is quite nontrivial to
implement in rustc today (especially when accounting for function pointers and
such).

This commit takes a different solution, *always* passing SIMD arguments through
memory instead of passing as immediates. This strategy solves the problem at the
LLVM layer because the ABI between two functions never uses SIMD registers. This
also shouldn't be a hit to performance because SIMD performance is thought to
often rely on inlining anyway, where a `call` instruction, even if using SIMD
registers, would be disastrous to performance regardless. LLVM should then be
more than capable of fixing all our memory usage to use registers instead after
enough inlining has been performed.

Note that there's a few caveats to this commit though:

* The "platform intrinsic" ABI is omitted from "always pass via memory". This
  ABI is used to define intrinsics like `simd_shuffle4` where LLVM and rustc
  need to have the arguments as an immediate.

* Additionally this commit does *not* fix the `extern` ("C") ABI. This means
  that the bug in rust-lang#44367 can still happen when using non-Rust-ABI functions. My
  hope is that before stabilization we can ban and/or warn about SIMD types in
  these functions (as AFAIK there's not much motivation to belong there anyway),
  but I'll leave that for a later commit and if this is merged I'll file a
  follow-up issue.

All in all this...

Closes rust-lang#44367
alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 14, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
bors added a commit that referenced this issue Oct 14, 2018
rustc: Fix (again) simd vectors by-val in ABI

The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes #50154
Closes #52636
Closes #54583
Closes #55059

[quite a lot]: #47743
[discussion]: #44367
[wasn't]: #50154
alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 16, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
kennytm added a commit to kennytm/rust that referenced this issue Oct 18, 2018
rustc: Fix (again) simd vectors by-val in ABI

The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 19, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
Manishearth added a commit to Manishearth/rust that referenced this issue Oct 20, 2018
The issue of passing around SIMD types as values between functions has
seen [quite a lot] of [discussion], and although we thought [we fixed
it][quite a lot] it [wasn't]! This PR is a change to rustc to, again,
try to fix this issue.

The fundamental problem here remains the same, if a SIMD vector argument
is passed by-value in LLVM's function type, then if the caller and
callee disagree on target features a miscompile happens. We solve this
by never passing SIMD vectors by-value, but LLVM will still thwart us
with its argument promotion pass to promote by-ref SIMD arguments to
by-val SIMD arguments.

This commit is an attempt to thwart LLVM thwarting us. We, just before
codegen, will take yet another look at the LLVM module and demote any
by-value SIMD arguments we see. This is a very manual attempt by us to
ensure the codegen for a module keeps working, and it unfortunately is
likely producing suboptimal code, even in release mode. The saving grace
for this, in theory, is that if SIMD types are passed by-value across
a boundary in release mode it's pretty unlikely to be performance
sensitive (as it's already doing a load/store, and otherwise
perf-sensitive bits should be inlined).

The implementation here is basically a big wad of C++. It was largely
copied from LLVM's own argument promotion pass, only doing the reverse.
In local testing this...

Closes rust-lang#50154
Closes rust-lang#52636
Closes rust-lang#54583
Closes rust-lang#55059

[quite a lot]: rust-lang#47743
[discussion]: rust-lang#44367
[wasn't]: rust-lang#50154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-simd Area: SIMD (Single Instruction Multiple Data) B-unstable Blocker: Implemented in the nightly compiler and unstable. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants