Reusing allocation addresses should imply a happens-before relationship #3450

joboet · 2024-04-04T09:24:36Z

The following code fails under miri:

#![feature(reentrant_lock)]

use std::thread;
use std::sync::ReentrantLock;

fn main() {
    let lock = ReentrantLock::new(());
    let acquire = || drop(lock.lock());

    thread::scope(|s| {
        s.spawn(acquire);
        s.spawn(acquire);
    });
}

Commit: e38a828bb8c1faa95be23d1fa902913d218de848
Command: ./miri run -Zmiri-seed=706 -Zmiri-track-weak-memory-loads -Zmiri-preemption-rate=0 race.rs
Result:

note: tracking was triggered
   --> /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:187:16
    |
187 |             if self.owner.load(Relaxed) == this_thread {
    |                ^^^^^^^^^^^^^^^^^^^^^^^^ weak memory emulation: outdated value returned from load
    |
    = note: BACKTRACE on thread `unnamed-2`:
    = note: inside `std::sync::ReentrantLock::<()>::lock` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:187:16: 187:40
note: inside closure
   --> race.rs:8:27
    |
8   |     let acquire = || drop(lock.lock());
    |                           ^^^^^^^^^^^

error: Undefined Behavior: Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) non-atomic read on thread `unnamed-2` at alloc1432+0x10. (2) just happened here
   --> /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:244:34
    |
244 |         *self.lock_count.get() = (*self.lock_count.get()).checked_add(1)?;
    |                                  ^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) non-atomic write on thread `unnamed-1` and (2) non-atomic read on thread `unnamed-2` at alloc1432+0x10. (2) just happened here
    |
help: and (1) occurred earlier here
   --> race.rs:8:22
    |
8   |     let acquire = || drop(lock.lock());
    |                      ^^^^^^^^^^^^^^^^^
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
    = note: BACKTRACE (of the first span) on thread `unnamed-2`:
    = note: inside `std::sync::ReentrantLock::<()>::increment_lock_count` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:244:34: 244:58
    = note: inside `std::sync::ReentrantLock::<()>::lock` at /Users/Jonas/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/reentrant_lock.rs:188:17: 188:44
note: inside closure
   --> race.rs:8:27
    |
8   |     let acquire = || drop(lock.lock());
    |                           ^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

Error: command exited with non-zero code `cargo +miri --quiet run --manifest-path /Users/Jonas/src/miri/Cargo.toml -- -Zmiri-seed=706 -Zmiri-track-weak-memory-loads -Zmiri-preemption-rate=0 race.rs --edition=2021`: 1

ReentrantLock uses the address of a TLS variable to check whether the current thread already owns the lock. As -Zmiri-track-weak-memory-loads shows, the second thread reads an outdated owner value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.

However, I do not believe this to be an issue with ReentrantLock. In any allocator, reallocating the same address requires observing that the address has been deallocated. Thus, there must be a happens-before relationship between deallocation and allocation. miri currently does not form such a relationship, leading to the issue above. With happens-before, the outdated read cannot happen because the second thread will observe that the owner has been reset and released after allocating the TLS variable.

The text was updated successfully, but these errors were encountered:

RalfJung · 2024-04-04T09:32:47Z

Hm, I've never heard that the allocator would be required to establish happens-before relationships when doing address reuse.

The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use Relaxed access to check that the address has been freed, and not actually incur happens-before? Not sure.

Would it help to make these owned accesses in the ReentrantLock check be release/acquire?

@Amanieu @m-ou-se I'd be curious what your take on this is.

joboet · 2024-04-04T09:36:13Z

Would it help to make these owned accesses in the ReentrantLock check be release/acquire?

No, because those could still load an old value.

The compiler already can't move things that access this memory down below the free or up across the malloc, so it may be sufficient for the allocator to use Relaxed access to check that the address has been freed, and not actually incur happens-before? Not sure.

If there is no happens-before, then writes from before the free could be ordered after the malloc and cause data races.

RalfJung · 2024-04-04T09:37:05Z

As -Zmiri-track-weak-memory-loads shows, the second thread reads an outdated owner value and as with this specific seed the TLS allocation happens to be on the same address, it thinks it already owns the lock; thus triggering UB inside miri.

I am not sure I fully understand what happens. So we're reading an outdated owner which happens to have the right value but is actually a different thread? And the proper, latest value of owner would be 0? So if the owner == this_thread arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?

joboet · 2024-04-04T09:39:44Z

I am not sure I fully understand what happens. So we're reading an outdated owner which happens to have the right value but is actually a different thread?

Yes, exactly.

So if the owner == this_thread arm had a debug_assert to ensure that the lock count is currently non-zero, that would catch this particular execution (though likely it would not fix the issue)?

No, because that read would still race with the write of the count in the other thread (or at least miri would think so).

RalfJung · 2024-04-04T09:41:31Z

Is it possible to reproduce this without involving ReentrantLock? I thought something like this would do it but there doesn't seem to be a race here.

use std::thread;

// Just a type with a rare size to make it more likely that we will
// reuse the right allocation.
type T = [u8; 7];

fn thread() {
    for _ in 0..100 {
        let b = Box::new(T::default());
        drop(b);
        thread::yield_now();
    }
}

fn main() {
    let j1 = thread::spawn(thread);
    let j2 = thread::spawn(thread);
    j1.join().unwrap();
    j2.join().unwrap();
}

RalfJung · 2024-04-04T09:42:53Z

Ah I guess it's not enough to just reuse the memory, as the data race detector will consider the old and new allocation to have nothing to do with each other. It's about using (abusing?) the address as a signal for synchronization on another shared location.

I am honestly not convinced that your reasoning is sound here, it sounds extremely sketchy to me.

RalfJung · 2024-04-04T09:49:22Z

Cc @rust-lang/opsem any opinions on this?

I feel like if we're changing Miri to accommodate this we at least need to also document this as a requirement for our allocator traits. I don't know enough about allocators to judge whether there's a a risk of a real-world allocator violating this property.

RalfJung · 2024-04-04T09:52:40Z

In this case it's a thread-local, that wouldn't even go through a regular allocator right? I have no idea how the memory backing thread-locals is managed, some sort of page table magic as part of thread creation/destruction or so? Where would the happens-before come from there?

bjorn3 · 2024-04-04T09:58:40Z

musl libc either puts TLS on the (mmapped) stack or uses malloc to allocate it.

bjorn3 · 2024-04-04T10:01:44Z

The kernel would have a happens before between the munmap and mmap, right? It will only reuse a mmap region after it is certain all threads have flushed the TLB entries for the old mapping (which requires synchronization) Same with a memory allocator which won't reuse a memory location until after it has synchronized with the free call.

joboet · 2024-04-04T10:05:29Z

Yes, this is invariant over all correct memory allocators. In terms of hardware, if the free call and its internal writes do not present a barrier, they may get ordered before reads to the allocation, which therefore could read data written after the malloc.

tmiasko · 2024-04-04T10:25:22Z

Separately, doesn't address reuse mean that ReentrantLock ownership is transferred to a different thread when a thread exits without releasing the lock? That doesn't seem right either.

joboet · 2024-04-04T10:33:58Z

It's the same thing: as long as there is a happens-before relationship between the deallocation and allocation, that is totally sound.

tmiasko · 2024-04-04T10:44:40Z

Sure, it might be sound but is it safe? It seems wrong that you can have multiple threads locking the same lock without unlocking it first (EDIT: i.e., it is inconsistent with guarantees documented in ReentrantLock::lock).

Amanieu · 2024-04-04T10:48:13Z

If free makes the freed memory available to other threads, it must use a Release ordering. Otherwise the CPU could reorder writes to the freed memory after it has been made available to other threads.
If alloc uses memory that comes from other threads, it must use an Acquire ordering, for the same reason.
(Acquire/release is not needed for a thread-local allocator, but in that case it cannot lead to a memory address being re-used across threads)
ReentrantLock checking the address of a TLS variable is sufficient for safety: even if the address is reused from another thread, this would mean that this other thread has already exited. The Sync invariants are maintained since there is no concurrent shared access from multiple threads.

RalfJung · 2024-04-04T11:20:33Z

@joboet is there a way to extract this into a self-contained testcase that does not rely on ReentrantLock?

joboet · 2024-04-04T12:45:51Z

Yes! This one works for any seed and will only succeed if the happens-before relationship is formed:

#![feature(strict_provenance)]
#![feature(sync_unsafe_cell)]

use std::cell::SyncUnsafeCell;
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};
use std::thread;

static ADDR: AtomicUsize = AtomicUsize::new(0);
static VAL: SyncUnsafeCell<i32> = SyncUnsafeCell::new(0);

fn addr() -> usize {
    let alloc = Box::new(42);
    <*const i32>::addr(&*alloc)
}

fn thread1() {
    unsafe {
        VAL.get().write(42);
    }
    let alloc = addr();
    ADDR.store(alloc, Relaxed);
}

fn thread2() -> bool {
    // We try to get an allocation at the same address. If we fail too often,
    // try again with a different allocation to ensure it is still in miris
    // reuse cache.
    for _ in 0..16 {
        let alloc = addr();
        let addr = ADDR.load(Relaxed);
        if alloc == addr {
            // If the new allocation is at the same address as the old one,
            // there must be a happens-before relationship between them.
            // Therefore, we can read VAL without racing and must observe
            // the write above.
            let val = unsafe { VAL.get().read() };
            assert_eq!(val, 42);
            return true;
        }
    }

    false
}

fn main() {
    let mut success = false;
    while !success {
        let t1 = thread::spawn(thread1);
        let t2 = thread::spawn(thread2);
        t1.join().unwrap();
        success = t2.join().unwrap();
        
        // Reset everything.
        ADDR.store(0, Relaxed);
        unsafe { VAL.get().write(0); }
    }
}

CAD97 · 2024-04-04T18:13:01Z

Since C11, per cppreference:

A previous call to free or realloc that deallocates a region of memory synchronizes-with a call to malloc that allocates the same or a part of the same region of memory. This synchronization occurs after any access to the memory by the deallocating function and before any access to the memory by malloc. There is a single total order of all allocation and deallocation functions operating on each particular region of memory.

synchronizes-with is the relation from a release store to an acquire load; the malloc family is mandated to provide (at least) AcqRel synchronization. $A$ synchronizes-with $B$ implies $A$ happens-before $B$.

This doesn't mean that __rust_alloc is required to provide this guarantee, but it does imply LLVM could assume it does (although I don't think it'd ever make a derivation that isn't already justified by disjoint provenance) and that code authors are likely to assume it holds.

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live to have the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

RalfJung · 2024-04-04T18:18:16Z

Since C11, per cppreference:

Thanks! That's useful.

So yeah Miri should implement that then. But also IMO our allocator traits should document this.

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away malloc (and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like in ReentrantLock? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap. (They can still arbitrarily overlap with other allocations.)

digama0 · 2024-04-05T02:07:30Z

However, this also ties into provenance and allocation planes — this reasoning relies on it being impossible for two disjoint allocated objects that are simultaneously live having the same address. It's not necessarily guaranteed that this is true (see discussions around resource exhaustion and optimizing out allocations).

Ah indeed this entire approach that ReentrantLock takes is kind of incompatible with LLVM optimizations that reduce memory usage. That's a separate problem, but still a challenging one. Would people rather be able to optimize away malloc (and other allocations) that turn out to not be needed or use uniqueness of addresses as correctness argument like in ReentrantLock? We can't have both... well I guess we can if we annotate the allocations whose uniqueness this uses as "must not be optimized away", and then use that as a clue to make sure two such allocations never overlap.

I believe we were already considering something like this in the model, some kind of "plane 0" used for external allocations and such, and which we force the usage of once the user starts doing difficult-to-analyze things with pointers. (We called it "exposed" but this meaning is overloaded, I think... I don't think it's necessarily the same as the strict-provenance "expose" but there are some relationships between them.) My guess is that ReentrantLock is either already doing something that would imply the use of this allocation plane, or could be modified to do so once we have enough details hammered out to suggest actual code changes.

RalfJung · 2024-04-05T06:34:12Z

FWIW my interpretation was that it is exactly the same "exposed".

But anyway, let's continue in rust-lang/unsafe-code-guidelines#328.

when an address gets reused, establish a happens-before link in the data race model Fixes #3450

Address reuse improvements and fixes - when an address gets reused, establish a happens-before link in the data race model - do not reuse stack addresses, and make the reuse rate configurable Fixes #3450

Address reuse improvements and fixes - when an address gets reused, establish a happens-before link in the data race model - do not reuse stack addresses, and make the reuse rate configurable Fixes rust-lang/miri#3450

joboet mentioned this issue Apr 4, 2024

Reduce code size of thread::set_current rust-lang/rust#123356

Merged

tmiasko mentioned this issue Apr 4, 2024

ReentrantLock can be held by multiple threads at the same time rust-lang/rust#123458

Closed

RalfJung mentioned this issue Apr 5, 2024

Removing allocations is technically unsound in OOM (out-of-memory) situations rust-lang/unsafe-code-guidelines#328

Open

RalfJung mentioned this issue Apr 16, 2024

Address reuse improvements and fixes #3475

Merged

bors added a commit that referenced this issue Apr 16, 2024

Auto merge of #3475 - RalfJung:reduce-reuse-recycle, r=RalfJung

22895df

when an address gets reused, establish a happens-before link in the data race model Fixes #3450

RalfJung added I-false-UB Impact: makes Miri falsely report UB, i.e., a false positive (with default settings) A-concurrency Area: affects our concurrency (multi-thread) support labels Apr 19, 2024

bors closed this as completed in 9b36914 Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusing allocation addresses should imply a happens-before relationship #3450

Reusing allocation addresses should imply a happens-before relationship #3450

joboet commented Apr 4, 2024

RalfJung commented Apr 4, 2024

joboet commented Apr 4, 2024 •

edited

Loading

RalfJung commented Apr 4, 2024 •

edited

Loading

joboet commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024 •

edited

Loading

bjorn3 commented Apr 4, 2024

bjorn3 commented Apr 4, 2024

joboet commented Apr 4, 2024

tmiasko commented Apr 4, 2024

joboet commented Apr 4, 2024

tmiasko commented Apr 4, 2024 •

edited

Loading

Amanieu commented Apr 4, 2024

RalfJung commented Apr 4, 2024

joboet commented Apr 4, 2024

CAD97 commented Apr 4, 2024 •

edited

Loading

RalfJung commented Apr 4, 2024 •

edited

Loading

digama0 commented Apr 5, 2024 •

edited

Loading

RalfJung commented Apr 5, 2024

Reusing allocation addresses should imply a happens-before relationship #3450

Reusing allocation addresses should imply a happens-before relationship #3450

Comments

joboet commented Apr 4, 2024

RalfJung commented Apr 4, 2024

joboet commented Apr 4, 2024 • edited Loading

RalfJung commented Apr 4, 2024 • edited Loading

joboet commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024

RalfJung commented Apr 4, 2024 • edited Loading

bjorn3 commented Apr 4, 2024

bjorn3 commented Apr 4, 2024

joboet commented Apr 4, 2024

tmiasko commented Apr 4, 2024

joboet commented Apr 4, 2024

tmiasko commented Apr 4, 2024 • edited Loading

Amanieu commented Apr 4, 2024

RalfJung commented Apr 4, 2024

joboet commented Apr 4, 2024

CAD97 commented Apr 4, 2024 • edited Loading

RalfJung commented Apr 4, 2024 • edited Loading

digama0 commented Apr 5, 2024 • edited Loading

RalfJung commented Apr 5, 2024

joboet commented Apr 4, 2024 •

edited

Loading

RalfJung commented Apr 4, 2024 •

edited

Loading

RalfJung commented Apr 4, 2024 •

edited

Loading

tmiasko commented Apr 4, 2024 •

edited

Loading

CAD97 commented Apr 4, 2024 •

edited

Loading

RalfJung commented Apr 4, 2024 •

edited

Loading

digama0 commented Apr 5, 2024 •

edited

Loading