Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking where we rely on LLVM giving more guarantees than C #292

Open
RalfJung opened this issue Jun 27, 2021 · 7 comments
Open

Tracking where we rely on LLVM giving more guarantees than C #292

RalfJung opened this issue Jun 27, 2021 · 7 comments

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 27, 2021

There are some places where we rely on LLVM having less UB than C does. Seems good to have a list of those, and keep a careful eye for what LLVM does in that space since they might not consider Rust when they adjust their rules here.

  • We rely on LLVM not doing TBAA. This is easy since clang turns type-based alias information into explicit annotations that we just do not emit, so I am not at all worried about this one.
  • We rely on inttoptr being always safe. In C, this is UB for out-of-bounds pointers (except if they are one-past-the-end).
  • We rely on == and <= (and the other comparison operators) on pointers to be always safe, and we'd probably be in trouble even if those were just giving non-deterministic results. Essentially we need them to just compare the underlying integer address and ignore provenance.
  • We rely on LLVM's memcpy/memmove/memset to allow size 0 for all pointers. (This is explicitly allowed by the LangRef: "If <len> is 0, it is no-op modulo the behavior of attributes attached to the arguments.")

If you know of anything else, please let me know so I can add it to the list. :)

@mcy
Copy link

mcy commented Jun 28, 2021

I'll definitely post here if anything pops into my head, but I think function pointers, esp function pointer equality, is generally a little sketchy even in the absence of things like -Wl,-icf=all.

I know there's a separate issue about uninit-ness of padding bits (e.g. when is the padding in (u8, u16) uninit) which will probably have LLVM-level considerations (does storeing an aggregate to an otherwise initialized alloca make the padding bits undef? I think the answer is no but something to keep an eye out for).

Also, when you say TBAA, you probably also mean at LTO time, right? It may be worth noting that cross-language LTO maybe to be done without strict aliasing. I actually have no idea if TBAA survives into embeded bitcode and if LLVM is entitled to use it across modules.

@thomcc
Copy link
Member

thomcc commented Jun 29, 2021

These are probably not only relevant to LLVM, but to the new GCC backend (and gcc-rs too).

(Of course, they also apply to cranelift, but it doesn't do much optimizing, so it seems unlikely to cause many problems)

Still, it's probably worth ensuring we have tests as some sort of "canary" that would help indicate if these rules are violated... It's somewhat likely we have them already, admittedly

@RalfJung
Copy link
Member Author

I'll definitely post here if anything pops into my head, but I think function pointers, esp function pointer equality, is generally a little sketchy even in the absence of things like -Wl,-icf=all.

Fn ptrs are mostly sketchy because Rust tells LLVM their address does not matter.

I know there's a separate issue about uninit-ness of padding bits (e.g. when is the padding in (u8, u16) uninit) which will probably have LLVM-level considerations (does storeing an aggregate to an otherwise initialized alloca make the padding bits undef? I think the answer is no but something to keep an eye out for).

Here the main problem is that it is rather unclear what the exact rules in C even are (that's the entire indeterminate / unspecified value / trap representation debacle). So I have no idea if we rely on more than what C guarantees, since C doesn't really say in clear terms what it guarantees.^^

Also, when you say TBAA, you probably also mean at LTO time, right? It may be worth noting that cross-language LTO maybe to be done without strict aliasing. I actually have no idea if TBAA survives into embeded bitcode and if LLVM is entitled to use it across modules.

Yes, I also mean at LTO time. I would assume that LLVM handle situations where some modules have TBAA info and others do not correctly.

Of course, when Rust code calls C code, the C UB rules still apply: a C function taking int* and float* (that actually loads both pointers at their expected type) is UB to call with aliasing pointers from Rust. But that is a different discussion.

These are probably not only relevant to LLVM, but to the new GCC backend (and gcc-rs too).

Ah, you mean because GCC treats TBAA more implicitly? Yes, that could be a problem for the GCC backend. I hope the people building it are aware. :)

@thomcc
Copy link
Member

thomcc commented Jun 29, 2021

Ah, you mean because GCC treats TBAA more implicitly?

I'm not certain about this. I've heard that it does, but I don't know for sure either way.

I think @antoyo is the person building the GCC backend, and so might plausibly know if this is the case (and would be good to loop in here either way).

@antoyo
Copy link

antoyo commented Sep 25, 2022

I'm not exactly sure what the question was here, but GCC enables strict-aliasing by default.

@RalfJung
Copy link
Member Author

That means the GCC Rust backends need to find a way to disable strict-aliasing, since otherwise they are unsound.

(And same for the other cases where Rust has less UB than C does, such as wrapping_offset pointer arithmetic, int2ptr casts, pointer comparison. Also there are some subtle questions around zero-sized accesses which do not exist in C.)

@antoyo
Copy link

antoyo commented Jan 23, 2023

I'm not exactly sure about those because I'm not familiar enough with LLVM, but maybe those 2 are things where LLVM gives more guarantees than C:

  • Signed integer overflow.
  • Integer promotion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants