Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Allow safe access of shareable static muts #14862

Closed
wants to merge 3 commits into from

Conversation

alexcrichton
Copy link
Member

Taking a shareable loan of a static mut is safe if the type contained in the
location is ascribes to Share. This commit removes the need to have an
unsafe block in these situations.

Unsafe blocks are still required to write to or take mutable loans out of
statics. An example of code that no longer requires unsafe code is:

use std::sync::atomics;

static mut CNT: atomics::AtomicUint = atomics::INIT_ATOMIC_UINT;

let cnt = CNT.fetch_add(1, atomics::SeqCst);

As a consequence of this change, this code is now permitted:

static mut FOO: uint = 30;

println!("{}", FOO);

The type uint is Share, and the static only has a shareable loan taken out
on it, so the access does not require an unsafe block. Writes to the static are
still unsafe, however, as they can invoke undefined behavior if not properly
synchronized.

Closes #13232

@huonw
Copy link
Member

huonw commented Jun 12, 2014

As a consequence of this change, this code is not permitted:

now permitted?

@alexcrichton
Copy link
Member Author

Indeed!

// run. It could also read the updated value of LOG_LEVEL in which case it
// reads the correct value.
//
// Also note that the log level stored is the real log level minus 1 so the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/minus/plus/? (If it's being stored as minus one, why is there an additional - 1 below?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, the comment should be plus, but the minus below is to return the real log level as opposed to the internally stored one. As in, if you specify RUST_LOG=3, the stored log level will be 4 but any callers of log_level() should see 3.

@alexcrichton
Copy link
Member Author

r? @pnkfelix

@huonw
Copy link
Member

huonw commented Jul 7, 2014

Ping, r? @pnkfelix

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

Is this actually a safe approach?

My impression was that static mut FOO: uint intentionally required unsafe {} for reading because reading a static mut is an inherently unsafe operation. No amount of synchronization on the write side will make reading safe, unless you use synchronization on the reading side as well.

I feel like a better approach would be to allow statics to have an Unsafe interior if the type is also Share. That way you can still get mutability (via Unsafe) but the entire static value is safe to use as a whole. The downside of this approach is that a static is no longer guaranteed to be unchanging, but at least it's safe.

Note that types like uint are Share because the assumption is made that any construct that provides a &-pointer to a shared uint will prevent any concurrent &mut pointers from existing (or being usable).

However, a static mut does not make that guarantee. In fact, the entire point of static mut is for unrestricted mutable access to global shared state. Because of this, you cannot rely on Share to make reads safe.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

I think it's safe to say that a static mut is roughly the equivalent of a *mut pointer. The static mut must be unsafe for both reads and writes for the same reason that dereferencing a *mut pointer must be unsafe for both reading and mutation (with the sole difference that accessing a static mut is guaranteed to not result in dereferencing unallocated memory, although you can trivially do so if the static mut contains any form of pointer/reference itself).

Meanwhile a static is roughly equivalent to a & reference. This also explains why a static cannot contain any Unsafe data that is not Share; because it is effectively a & reference, the value must be Share for the & reference to be valid.

And this also explains why Unsafe values that are also Share should be allowed, because the Share indicates that it's safe.

@huonw
Copy link
Member

huonw commented Jul 9, 2014

It's still not possible to have a program with UB without using unsafe somewhere.

It seems perfectly reasonable to regard accessing a Share static mut as perfectly safe, with the edge case of overwriting it as the problem. That is, consider the safe Rust language to include Share + static muts with & access (having only that functionality is 100% ok), and then unsafe enables some extra dangerous functionality. As an example of this behaviour already in Rust, & can be transmuted to &mut and then modified in unsafe code, which would possibly break a read of the &, but we still don't regard every access of a & as unsafe.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@huonw

It's still not possible to have a program with UB without using unsafe somewhere.

Entirely irrelevant. The existence of unsafe does not whatsoever mean that safe code should be able to invoke UB. Safe code should only be able to invoke UB as a result of bugs in the unsafe code. But making reads for static mut not require unsafe makes it literally impossible to write bug-free unsafe code that writes to the value. There is no way whatsoever to contain the unsafety when writing to the static.

The entire point of unsafe {} blocks are to contain unsafety. If the unsafety leaks outside of the unsafe {} block, then it's a bug. But since it is impossible to contain the unsafety when writing to a static mut with this PR, that leads to the inevitable conclusion that any writes to a static mut (that is modified by this PR to consider reads to be safe) requires the entire program to be contained within an unsafe {} block. And not only is that completely preposterous, but it also means you may as well let reads remain unsafe anyway.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@huonw

& can be transmuted to &mut and then modified in unsafe code

No it can't.

I mean, yes you can write that code, but that would be incorrect code.

The point of unsafe is not that you can write buggy code that makes safe operations invoke UB, it's that you can write bug-free code that preserves the safety of safe operations. But it is impossible to write bug-free code that writes to a static mut Foo: uint as long as reading from the static does not require unsafe {}.

@huonw
Copy link
Member

huonw commented Jul 9, 2014

I mean, yes you can write that code, but that would be incorrect code.

As is overwriting a static mut without synchronisation; I don't see how this differs.

But it is impossible to write bug-free code that writes to a static mut Foo: uint as long as reading from the static does not require unsafe {}.

No, it's clearly not. We can currently write bug-free code that writes to a static mut, and we will still be able to do it after this lands. Reading that is no longer unsafe does not make writing bug-free code impossible. You are being overly hyperbolic.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@huonw It differs because you can write safe wrappers for unsafe operations. Again, that is the entire point of unsafe. If you can't write a safe wrapper at some level for some unsafe operation, then the operation should not be legal.

We can currently write bug-free code that writes to a static mut,

Correct, but it's only bug-free because reading is unsafe and therefore requires using a safe wrapper that knows how to coordinate with the writing side to manage any synchronization.

and we will still be able to do it after this lands.

Incorrect. You can still write a safe wrapper for reading that handles synchronization, yes, but reading without that wrapper is also considered safe. Except it can't be safe. The only way to make reading safe is to go through the wrapper. Again, the mere existence of code that writes to a static mut makes the act of reading from the static mut inherently unsafe. And that's precisely why reading from a static mut is considered an unsafe operation today. This is completely irrespective of whether the type of the static conforms to Share, because that's not what Share means.

Instead of accusing me of hyperbole, you should consider that perhaps you're actually wrong about this. I am not being hyperbolic in the slightest. I am 100% serious when I use the phrase "literally impossible".

@alexcrichton
Copy link
Member Author

Instead of accusing me of hyperbole, you should consider that perhaps you're actually wrong about this

Let's please calm down before commenting further. Heated arguments largely make little progress.

@alexcrichton
Copy link
Member Author

We can currently write bug-free code that writes to a static mut,

Correct, but it's only bug-free because reading is unsafe and therefore requires using a safe wrapper that knows how to coordinate with the writing side to manage any synchronization

Have you considered code such as this?

static mut FOO: uint = 3;
static mut LOCK: NativeMutex = NATIVE_MUTEX_INIT;

fn write() {
    unsafe {
        let _l = LOCK.lock();
        FOO = 4;
    }
}


fn read() -> uint {
    unsafe {
        let _l = LOCK.lock();
        FOO
    }
}

All reads/writes made to FOO in this scenario are safe, which seems to contradict your claim that this is "literally impossible".

This is not the only case where this is safe, there are many many other safe and defined situations. Concurrent reads and writes only invoke undefined behavior if there is at least one unsynchronized operation occurring concurrently with any write operation. Note that this does not include all possible behavior, as it seems you are claiming. One could have externally synchronized reads/writes which allow unsynchronized reads/writes to have defined behavior.

Also note that the precise definition of Share is:

The precise definition is: a type T is Share if &T is thread-safe. In other words, there is no possibility of data races when passing &T references between tasks.

As implemented, this is precisely allowing exactly that. It is allowed to take a shared reference, &T, to any static mut which has a type T which ascribes to Share.

Other operations, such as writing to a global, may invoke undefined behavior, so they are unsafe.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@alexcrichton

All existing read/writes in your code are safe, yes. But with your PR, I can trivially write

fn foo() -> uint {
    FOO
}

and this is unsafe. My ability to write unsafe code consisting of nothing but safe operations means that you have violated the contract of the unsafe {} block in your write() function. Because I can create a data race using nothing but safe operations, your write() function is buggy and itself needs to be marked as unsafe. And so does any code that uses it, ad infinitum. Because it is impossible to safely contain the unsafety in your write() function, the end result is that the entire program must be unsafe {}.

Basically, you are making the same mistake that @huonw is, which is confusing the code you have written with the code that is legal to write. It doesn't matter whether or not you actually write an unsynchronized read, the fact that you can is what makes the write() function buggy.

Regarding Share, you are grossly misapplying its definition. Yes, if a type T is Share, then &T is thread-safe. The problem is you are interpreting that to mean that constructing a &T is a safe operation, and that is not true. In fact, the entire reason that Share works is precisely because it is illegal to construct a &T that aliases with a mutable value (either &mut T or a mutable slot). This PR explicitly makes it legal to construct a &T that aliases with a subset of mutable slots (namely, those defined by static mut), but that's not a safe thing to do.

@huonw
Copy link
Member

huonw commented Jul 9, 2014

You could adjust Vec to include pub fn increase_len(&mut self) { self.len += 1 } , it's not safe, but there's no unsafe.

This seems very similar: if you are wrapping unsafe in a safe interface, you need to consider the whole interface, even parts that don't have unsafe {} blocks.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@huonw As I said before to @alexcrichton, you are confusing the code you have written with the code that is legal to write. Your function there is buggy, and will cause other unsafe {} code to likely crash. But it's the unsafe code that exhibits the bad behavior, not the safe code.

This PR is introducing the opposite problem, where it's the safe code that behaves wrong. You may feel like arguing that you can trivially write unsafe {} code that causes safe code to behave wrong, but you would have to admit that your unsafe {} code is therefore buggy and needs to be fixed.

The problem with this PR is that you cannot fix the unsafe {} code. There is no way to write your unsafe {} mutation that causes a read operation to be safe. And that inability to make the read operation safe is why reads on static mut values must be considered unsafe. Because it is unsafe. You're just trying to get rid of the unsafe keyword without actually getting rid of the unsafe behavior.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

It occurs to me there's another alternative to making static able to contain Unsafe + Share values, and that's to allow for declaring methods as fn foo(*mut self) (along with the corresponding autoref from a mut T to a *mut T). If you can declare a method like that, you can then allow for calling methods on static mut values without unsafe {}, which I gather is the motivating reason for this PR. The method itself, because it takes a *mut, requires unsafe {} in order to dereference the pointer and access the mutable static value.

The downside to this approach is that it's still not particularly safe, because you can create a *mut T pointer to some invalid value (e.g. NULL) and invoke the method on it without unsafe {}. This suggests that instead of *mut T we should have a new reference type &unsafe T, which is a mutable pointer that requires unsafe to dereference (or to borrow as &T). This would basically be a more constrained form of *mut T, one that is non-nullable and can only be constructed from a valid value of type T (whereas you can construct a *mut T from pretty much anything).

Of course, introducing yet another reference type kind of sucks. So I think allowing static to containUnsafe + Share is still the best solution.

@pnkfelix
Copy link
Member

(wow there has been a lot of heat on this thread since the last review ping...)

@emberian
Copy link
Member

I'm fairly certain @kballard is right here. If you write FOO we have no mechanism of calling user-defined code before the read, meaning another thread could have written one field but not the other before being descheduled, leaving an inconsistent state, with the writer having no way at all to prevent this.

@emberian
Copy link
Member

I suppose it is possible to work around that by having your static mut be private and only allowing access through functions which guarantee the necessary invariants.

@emberian
Copy link
Member

(Also I would discourage any attempts at allowing static data to be mutable, as that removes our one fool-proof no-gotcha way of having rodata)

@nikomatsakis
Copy link
Contributor

This is an interesting debate. I confess I am torn.

On the one hand, I agree that the safety requirement is subtle, and hence if you can write to a static mut (unsafely), it is best to say that all reads are also unsafe, since both of them must be synchronized together for the code to be data-race free.

On the other hand, it would be very nice to have global locks and atomic integers that one can expose safely. (Of course it is possible to create fn accessors instead.)

I wonder if we can skin this cat another way. Currently, we have rules that say: "static variables cannot have Unsafe<T> embedded within them". The reason for this is that we want to put them in immutable memory, and interior mutability messes with that. Perhaps we can instead say "static variables with Unsafe<T> embedded within them must be Share and will be not be placed into immutable memory". I feel like we danced around this idea the last time this came up, and I can't recall what it's downsides were, besides being kind of complicated (and interacting with significant addresses to boot).

@lilyball
Copy link
Contributor

I'm a fan of "static Unsafe vars must be Share and will be placed into mutable memory". I too feel that it's a bit strange, that a static could actually be mutable, but that seems to me to be a consequence of the existence of Unsafe. What we have today, where you can't put Unsafe into a static, has shown to be an undesirable limitation (e.g. because actually-safe constructs end up requiring static mut and unsafe {})

@alexcrichton
Copy link
Member Author

Perhaps we can instead say "static variables with Unsafe embedded within them must be Share and will be not be placed into immutable memory".

I remember this coming up at the last work week, but it has the hazard of making code a little odd:

use std::sync::atomics;

let i = atomics::INIT_ATOMIC_UINT.fetch_add(1, atomics::SeqCst);
let j = atomics::INIT_ATOMIC_UINT; // assuming #13233
println!("{}", j.load(atomics::SeqCst)); // prints 1, not 0

@nikomatsakis
Copy link
Contributor

@alexcrichton ah yes, that was the thing we didn't like.

@nikomatsakis
Copy link
Contributor

I pondered this some more. It is certainly true that one must use privacy when writing to static muts, or else you cannot create a safe interface (since you must control reads and writes). However, this is almost always true when writing unsafe code. In fact, that's kind of the whole point of the unsafe keyword-- is to indicate code that is not safe on its own because it may interact with other code, so it is your responsibility to know what interacts exist and control them. It seems clearly worse to me to be able to modify the INIT_ATOMIC_UINT. Naturally we could have three kinds of statics but this seems like overkill to me. In short, I think our prior decision was the right one.

@nikomatsakis
Copy link
Contributor

Also, the supposed problem concerning static muts already exists for the Unsafe type: it has a public field, but creates unsafe *mut pointers to that field. The compiler permits safe access to that field even though unsafe aliases may exist. Failure to control access to a Unsafe<T> yields undefined behavior in the way as static mut. The reason for this, iirc, has to do with static initializers and privacy.

If we wanted to make static mut more secure, one option would be to simply forbid writes and mutable borrows of static mut values. In that case, if you wanted "free" writes (rather than using Atomic), you would simply make the type of your static mut be Unsafe<Foo> rather than Foo. You can then use the methods on Unsafe<Foo> to get a *mut Foo pointer. Now reads/writes are both unsafe (modulo the problem in first paragraph).

@lilyball
Copy link
Contributor

@nikomatsakis We could introduce static const as a way of bringing back the restriction on Unsafe that we have today with static, e.g. that you can't take a borrow on the value. This way INIT_ATOMIC_UINT can be static const, and therefore you have to initialize your own atomic value from it before you can do anything.

@nikomatsakis
Copy link
Contributor

@kballard yes, I consider three types of static to be overkill and inconsistent.

@lilyball
Copy link
Contributor

It is certainly true that one must use privacy when writing to static muts, or else you cannot create a safe interface

That's true. The problem is we have safe interfaces for values that want to live in statics (such as sync::Once). These safe interfaces should be usable without the unsafe keyword. In fact, that's the whole point of Unsafe, to allow you to wrap this kind of unsafe mutation in a safe interface for other people to use.

I do agree that 3 kinds of statics is not ideal, but I'm not sure it's worse than what we have today, with the requirement for unsafe to wrap perfectly safe operations.

@pnkfelix
Copy link
Member

Would a lint that flags pub static mut resolve the problem here?

In other words, would that satisfy the requirement that there be a fixed set of code (that must then be reasoned about in isolation), rather than allowing arbitrary new accessors from other mod's to be mixed in?

@pnkfelix
Copy link
Member

Or an alternative take on my previous suggestion: Make reads of pub static mut require unsafe, while reads of non-pub static mut (whose type ascribes to Share) do not require unsafe.

@lilyball
Copy link
Contributor

You can't get rid of the unsafe {}. Rust has never adopted a policy of unsafe {} only being required for public things. And it doesn't make any sense to start; with a static mut that doesn't require unsafe {} you can violate memory safety with supposedly-safe code.

@pnkfelix
Copy link
Member

I suppose my suggestion of flagging pub static mut fails to satisfy @nikomatsakis 's desire for "global locks and atomic integers that one can expose safely".

I worry that there is some fundamental philosophical divide here that strikes me as analogous to the debate that raged on rust-lang/rfcs#80 , and for some reason that divide is leading to very strongly worded comments from both sides on this ticket.

@nikomatsakis
Copy link
Contributor

@pnkfelix I agree that this is analagous -- which seems to offer precedent for the approach we planned in the RFC. (At some later point, perhaps, we could add a notion of an unsafe static (just as one might add an unsafe field).)

@zwarich
Copy link

zwarich commented Jul 22, 2014

@kballard Your objection about safe code being able to violate the invariants of unsafe code is already true today. For example, consider an example like this:

mod M {
    use std::ty::Unsafe;

    pub struct A<T> {
        a: Unsafe<T>,
        // Other fields.
    }

    impl<T> A<T> {
        // Functions that provide a safe interface.
    }
}

I can add a new safe function to the module and most likely break invariants preserved by the existing code:

mod M {
    use std::ty::Unsafe;

    struct A<T> {
        a: Unsafe<T>,
        // Other fields.
    }

    impl<T> A<T> {
        // Functions that provide a safe interface.
        fn backdoor<'a>(&'a self) -> &'a T {
            &self.a.value
        }
    }
}

I think it would be interesting to pursue a model where adding safe code doesn't break invariants provided by unsafe code, but it's not what we have today.

@alexcrichton
Copy link
Member Author

I have created an RFC for this PR: rust-lang/rfcs#177

@lilyball
Copy link
Contributor

@zwarich Your code snippet has actually convinced me that the fact that the value field of Unsafe<T> is public is a safety hole. Accessing that value should require unsafe {}. This may actually be the compelling reason to add unsafe fields (as proposed in RFC PR #80). Alternatively the value field could be made private, but I assume it is public for a reason (and I assume that reason is so static values can be constructed).

Taking a shareable loan of a `static mut` is safe if the type contained in the
location is ascribes to `Share`. This commit removes the need to have an
`unsafe` block in these situations.

Unsafe blocks are still required to write to or take mutable loans out of
statics. An example of code that no longer requires unsafe code is:

    use std::sync::atomics;

    static mut CNT: atomics::AtomicUint = atomics::INIT_ATOMIC_UINT;

    let cnt = CNT.fetch_add(1, atomics::SeqCst);

As a consequence of this change, this code is not permitted:

    static mut FOO: uint = 30;

    println!("{}", FOO);

The type `uint` is `Share`, and the static only has a shareable loan taken out
on it, so the access does not require an unsafe block. Writes to the static are
still unsafe, however, as they can invoke undefined behavior if not properly
synchronized.

Closes rust-lang#13232
As I have recently learned from aturon, it is undefined behavior to have any
unsynchronized access exected in parallel with an unsynchronized write. This
sort of behavior was possibly invoked by liblog during initialization. The log
level constant has been altered to become an atomic variable which prevents
undefined behavior.
@alexcrichton
Copy link
Member Author

Closing pending the resolution of rust-lang/rfcs#177

@alexcrichton alexcrichton deleted the safe-static-mut branch October 11, 2014 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addressing a Share static mut should be safe
7 participants