Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for no-gil/freethreaded work #4265

Open
6 of 12 tasks
alex opened this issue Jun 20, 2024 · 49 comments
Open
6 of 12 tasks

Tracking issue for no-gil/freethreaded work #4265

alex opened this issue Jun 20, 2024 · 49 comments

Comments

@alex
Copy link
Contributor

alex commented Jun 20, 2024

We didn't have a dedicated issue for this, so now there's one.

TODO:

  • Add a cfg for no-gil, but only allowed behind an experimental feature
  • ffi-check passing with a no-GIL build
  • Adopt new owned-reference-friendly C APIs
    • PyDict_GetItemRef
    • PyList_GetItemRef
    • PyDict_Next
    • PyWeakref_GetRef
    • PyImport_AddModuleRef
  • Identify places that assume a Python<'_> indicates only a single thread is executing:
    • pyo3::sync::GILOnceCell
    • PyClassBorrowChecker
    • GILProtected
    • ...
  • A way for extensions to declare that the Py_mod_gil slot should be set
@ngoldbaum
Copy link
Contributor

As a tiny piece of this and to try to learn the library better, I'm working on adding wrappers for the new GetItemRef C API functions in the 3.13 stable API. These are needed to be fully safe for free-threaded python and are nice to have anyway on older versions because strong references are easier to reason about.

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Jul 30, 2024

Just to update the current state of things: pyo3 builds against the free-threaded build if you do:

UNSAFE_PYO3_BUILD_FREE_THREADED=1 cargo build

If you use pyenv, you'll also need to locally delete or modify the .python-version file.

This very quicky crashes inside of mimalloc internals, ultimately inside of Py_InitializeEx:

  * frame #0: 0x000000010135af60 libpython3.13t.dylib`chacha_block + 448
    frame #1: 0x000000010134f7e8 libpython3.13t.dylib`_mi_os_get_aligned_hint + 172
    frame #2: 0x000000010135cd68 libpython3.13t.dylib`unix_mmap_prim + 136
    frame #3: 0x0000000101355e80 libpython3.13t.dylib`_mi_prim_alloc + 220
    frame #4: 0x000000010134fb30 libpython3.13t.dylib`mi_os_prim_alloc + 68
    frame #5: 0x0000000101348560 libpython3.13t.dylib`_mi_os_alloc_aligned + 352
    frame #6: 0x0000000101349a9c libpython3.13t.dylib`mi_reserve_os_memory_ex + 80
    frame #7: 0x0000000101347ee8 libpython3.13t.dylib`_mi_arena_alloc_aligned + 392
    frame #8: 0x000000010135bd00 libpython3.13t.dylib`mi_segment_alloc + 468
    frame #9: 0x0000000101354950 libpython3.13t.dylib`mi_segments_page_alloc + 1468
    frame #10: 0x000000010135ab94 libpython3.13t.dylib`mi_page_fresh_alloc + 56
    frame #11: 0x0000000101351f5c libpython3.13t.dylib`mi_find_page + 528
    frame #12: 0x0000000101344070 libpython3.13t.dylib`_mi_malloc_generic + 208
    frame #13: 0x000000010144e3d8 libpython3.13t.dylib`gc_alloc + 284
    frame #14: 0x000000010144e268 libpython3.13t.dylib`_PyObject_GC_New + 96
    frame #15: 0x000000010132042c libpython3.13t.dylib`PyDict_New + 84
    frame #16: 0x00000001013b3d24 libpython3.13t.dylib`_PyUnicode_InitGlobalObjects + 236
    frame #17: 0x000000010147e4dc libpython3.13t.dylib`pycore_interp_init + 72
    frame #18: 0x000000010147bb88 libpython3.13t.dylib`Py_InitializeFromConfig + 1360
    frame #19: 0x000000010147bc9c libpython3.13t.dylib`Py_InitializeEx + 144
    frame #20: 0x000000010006e5f0 pyo3-1eb544a7db3e1a47`pyo3::gil::prepare_freethreaded_python::_$u7b$$u7b$closure$u7d$$u7d$::h922d8fd5db1fd90c((null)={closure_env#0} @ 0x00000001710665c7, (null)=0x0000000171066640) at gil.rs:69:13
    frame #21: 0x0000000100047a4c pyo3-1eb544a7db3e1a47`std::sync::once::Once::call_once_force::_$u7b$$u7b$closure$u7d$$u7d$::h15baf6dd1f7316ea(p=0x0000000171066640) at once.rs:208:40
    frame #22: 0x000000010031a770 pyo3-1eb544a7db3e1a47`std::sys::sync::once::queue::Once::call::heacc08786c6d7dfa at queue.rs:183:21 [opt]
    frame #23: 0x00000001000478b4 pyo3-1eb544a7db3e1a47`std::sync::once::Once::call_once_force::h7b8eb88c3a02f292(self=0x00000001004dce70, f={closure_env#0} @ 0x000000017106671f) at once.rs:208:9
    frame #24: 0x00000001001d68b4 pyo3-1eb544a7db3e1a47`pyo3::gil::prepare_freethreaded_python::h316cd04b406e24c0 at gil.rs:66:5
    frame #25: 0x00000001001d6924 pyo3-1eb544a7db3e1a47`pyo3::gil::GILGuard::acquire::h2127069d9988a593 at gil.rs:174:21
    frame #26: 0x0000000100053000 pyo3-1eb544a7db3e1a47`pyo3::marker::Python::with_gil::h38979cd5e69873c3(f={closure_env#0} @ 0x000000017106679f) at marker.rs:403:21
    frame #27: 0x00000001001c6548 pyo3-1eb544a7db3e1a47`pyo3::conversions::std::array::tests::test_extract_non_iterable_to_array::h3c220ef1fe379cdf at array.rs:226:9
    frame #28: 0x000000010004a1b4 pyo3-1eb544a7db3e1a47`pyo3::conversions::std::array::tests::test_extract_non_iterable_to_array::_$u7b$$u7b$closure$u7d$$u7d$::h7de4fa3687a88518((null)=0x00000001710667fe) at array.rs:225:44

Just to make sure all of this is reproducible and we have some feedback on CI, I think I'm going to add a free-threaded CI job marked with continue-on-error with a test run that crashes like this.

@davidhewitt
Copy link
Member

That sounds great to me, thanks!

ngoldbaum added a commit to ngoldbaum/pyo3 that referenced this issue Jul 31, 2024
ngoldbaum added a commit to ngoldbaum/pyo3 that referenced this issue Aug 1, 2024
github-merge-queue bot pushed a commit that referenced this issue Aug 1, 2024
* Update dict.get_item binding to use PyDict_GetItemRef

Refs #4265

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* fix: fix logic error in dict.get_item bindings

* update: apply david's review suggestions for dict.get_item bindings

* update: create ffi::compat to store compatibility shims

* update: move PyDict_GetItemRef bindings to spot in order from dictobject.h

* build: fix build warning with --no-default-features

* doc: expand release note fragments

* fix: fix clippy warnings

* respond to review comments

* Apply suggestion from @mejrs

* refactor so cfg is applied to functions

* properly set cfgs

* fix clippy lints

* Apply @davidhewitt's suggestion

* deal with upstream deprecation of new_bound
@alex
Copy link
Contributor Author

alex commented Aug 2, 2024

I added a new checkbox for " Adopt new owned-reference-friendly C APIs". If we have a list of all the ones we need, I can make those sub-checkboxes.

@ngoldbaum
Copy link
Contributor

If we have a list of all the ones we need, I can make those sub-checkboxes.

I think PyDict_GetItemRef and PyList_GetItemRef are the most important ones. There'a a listing of the remaining ones in the HOWOTO guide for free-threading in the CPython docs: https://docs.python.org/3.13/howto/free-threading-extensions.html#borrowed-references

I also had a chat with @davidhewitt today and in addition to GilOnceCell, he pointed to GILProtected and PyCell as spots that make strong assumptions about the GIL.

Our first idea is to make GILProtected a no-op on Py_GIL_DISABLED builds (although we'll need to see if that has major fallout on user code) and as a first pass PyCell needs atomic increments and decrements to avoid data races in the free-threaded build.

In addition we need to use pyo3_ffi_check to update the assumptions the FFI bindings make about the free-threaded ABI. Doing this should hopefully fix some of the most egregious build issues. I am planning to work on that step next week.

I looked at adding a failing CI job, but that won't work right now because of if you run the tests on a free-threaded build with --no-fail-fast the tests will eventually deadlock. At least as far as I can see there's no option in cargo to automatically kill hung tests that run longer than a configurable timeout. You can do it manually pretty easily with a macro but I'd prefer not to do that and instead hold off on adding CI until the tests are runnable without deadlocks. Hopefully that won't be too long :)

@alex
Copy link
Contributor Author

alex commented Aug 2, 2024

Ok, updated the tracking list.

PyCell no longer exists, should that be something else?

@ngoldbaum
Copy link
Contributor

I'm still learning the library and it shows...

I think David meant Bound in our discussion and he just got mixed up with the old API after a long day. I'll let him clarify.

@alex
Copy link
Contributor Author

alex commented Aug 2, 2024

My guess is it's a reference to PyClassBorrowChecker, which manages the various borrow flags. But I'll let David say for sure.

@davidhewitt
Copy link
Member

My mistake, yes we removed the PyCell name with the Gil refs API 👍

@ngoldbaum
Copy link
Contributor

See #4421 which updates the FFI bindings for the free-threaded build. That's enough to get the tests to pass without deadlocking, so I added a CI config as well.

@alex
Copy link
Contributor Author

alex commented Aug 6, 2024

Added a checkbox for ffi-check being green.

@davidhewitt
Copy link
Member

Right, that makes a lot of sense, thanks. I'd missed the risk of deadlock on unsuspending a stack if the whole lot were to be relocked.

The important question to me is: how much does it matter if multiple threads race to build the initial value? If it doesn't matter at all, then you probably don't need to bother with critical sections. If it does matter -- because it's currently somewhat unlikely -- than critical sections might be useful as a way to ensure that the init function are mostly executed only once without introducing deadlock risk.

I think in practice it's helpful if other threads are blocked to avoid wasted work. I can't remember exactly why we chose to allow races with the original GILOnceCell (possibly because it was unlikely and simpler not to try).

I think I've sufficiently convinced myself now that GILOnceCell should keep the racy semantics and we should add PyOnceLock which implements blocking semantics. I think it should be possible to create blocking semantics on the gil enabled builds too, with a bit of careful thought to avoid deadlocking the GIL.

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Aug 30, 2024

@alex can you add a checkbox for adding a way for extensions created by PyO3 to set the Py_mod_gil slot? I think I might work on that next week.

@alex
Copy link
Contributor Author

alex commented Aug 30, 2024

Done.

FWIW, it seems to me like that should be one of the last steps, or at the very least it needs to be unsafe until pyo3 itself is no-gil ready.

davidhewitt pushed a commit that referenced this issue Sep 3, 2024
* Update dict.get_item binding to use PyDict_GetItemRef

Refs #4265

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* fix: fix logic error in dict.get_item bindings

* update: apply david's review suggestions for dict.get_item bindings

* update: create ffi::compat to store compatibility shims

* update: move PyDict_GetItemRef bindings to spot in order from dictobject.h

* build: fix build warning with --no-default-features

* doc: expand release note fragments

* fix: fix clippy warnings

* respond to review comments

* Apply suggestion from @mejrs

* refactor so cfg is applied to functions

* properly set cfgs

* fix clippy lints

* Apply @davidhewitt's suggestion

* deal with upstream deprecation of new_bound
@ngoldbaum
Copy link
Contributor

I see PyClassBorrowChecker has a check box above -- is it actually unsafe? It does store the data in a Cell, but since Cell isn't Sync, wouldn't PyO3 users see a compiler error if they tried to share it between threads?

Some more updates for the check boxes:

I also just opened a PR for BoundListIterator, since it still uses borrowed reference APIs.

@alex
Copy link
Contributor Author

alex commented Sep 9, 2024

PyClassBorrowChecker is the thing that's responsible for actually implementing the thread checking logic. pyo3 shoves them into PyObject *, so it can end up in mulitple threads simultaneously.

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Sep 10, 2024

I spent some time today trying to understand this more and now I have a test case I think that demonstrates the issue.

Rust code:

#[pyclass]
#[derive(Default)]
struct PyClassThreadIter {
    count: usize,
}

#[pymethods]
impl PyClassThreadIter {
    #[new]
    pub fn new() -> Self {
        Default::default()
    }

    fn __next__(&mut self) -> PyResult<usize> {
        self.count += 1;
        Ok(self.count)
    }
}

and corresponding Python test code:

def run_threaded(func, iters):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1000) as tpe:
        futures = [tpe.submit(func) for _ in range(iters)]
        for f in futures:
            f.result()


def test_parallel_iter():
    i = pyclasses.PyClassThreadIter()

    def func():
        next(i)

    run_threaded(func, 500000)

    assert next(i) == 500001

Running this on my M3 Mac with PYTHON_GIL=0 set eventually triggers an exception:

    def func():
>       next(i)
E       RuntimeError: Already borrowed

To answer my question above, this doesn't really have anything to do with the use of Cell in PyClassBorrowChecker. That isn't thread-safe, but even if I replaced the Cell with a std::sync::Mutex, that would only synchronize the borrow checking, it wouldn't prevent creating two simultaneous borrowed references in the first place.

Instead if we want to prevent that, I think we would need to refactor the pyclass internals so accessing the instance state is protected by e.g. a std::sync::RWLock.

@alex
Copy link
Contributor Author

alex commented Sep 10, 2024 via email

@ngoldbaum
Copy link
Contributor

So instead of struct BorrowFlag(usize), you'd have struct BorrowFlag(AtomicUsize). If we read the borrow flag and it's HAS_MUTABLE_BORROW, we enter a spinlock and block until the thread that owns the mutable reference releases it. Otherwise everything else is the same, except increment and decrement are atomic operations.

@alex
Copy link
Contributor Author

alex commented Sep 10, 2024 via email

@ngoldbaum
Copy link
Contributor

Ah well that's how it works right now! Except we're not tracking the references in a thread-safe manner.

It's not clear to me if it would be easier in the long run for the ecosystem of extensions written using PyO3 to need to add locking to avoid creating multiple simultaneous mutable borrows coming from Python code or to put the locking in PyO3 itself. That's where my suggestion to add locking comes from.

Maybe instead we could add a way to optionally add locking around acquiring simultaneous mutable references to pyclass state, to ease porting for extensions that are implicitly relying on the GIL today.

@alex
Copy link
Contributor Author

alex commented Sep 10, 2024 via email

@mejrs
Copy link
Member

mejrs commented Sep 12, 2024

Not spin lock, just raise an exception on an attempted concurrent borrow

I don't agree with this, if we implement this it's basically impossible to write a pyclass using this. The problem, as I see it, is that trying to lock something that is already locked (on a different thread) is expected behavior, but doing so with a refcell is a logic error and thus a bug. In the latter case panicking or returning an error is fine, in the former we should probably just wait.

@alex
Copy link
Contributor Author

alex commented Sep 12, 2024 via email

@davidhewitt
Copy link
Member

Sorry to be slow to engage with this part of the discussion; I've been travelling with family and then sick 🤮. Better now!

I think the new factor here is that previously if you had a &mut self method that didn't release the GIL (e.g. by never calling Python code), previously it was impossible to hit the error condition of competing mutable borrows. With freethreaded code, if we make concurrent attempts to mutably borrow an error, then all such cases become sources of extremely-hard-to-identify concurrency bugs. I lean towards @mejrs' opinion that having the exception makes it very hard for users to make correct #[pyclass] uses - even if their test suites don't trigger aliasing mutable borrows, their downstream code might!

The RwLock behaviour introduces performance bottlenecks and deadlock risks 😱, but it is at least correct. To mitigate deadlocks we could look at critical sections but I think they might be difficult to use correctly in the general case (e.g. inside PyRef<T>, due to the fact that PyRef<T> isn't a closure-style API at present). I think critical sections might be a future API change to explore rather than our primary option here.

Overall, I think we are forced to push a pretty tough change on users here. My instinct is that this is a good time to make #[pyclass(frozen)] the default, and to introduce #[pyclass(mutable)] as the opt-in:

  1. Require #[pyclass(mutable)] opt-in and deprecate #[pyclass(frozen)] opt-out. I agree with @alex that

    users can always accomplish shared behavior by taking a shared reference and using interior mutability

    and I think we should encourage that in docs (e.g. with atomics), and document #[pyclass(mutable)] as the more extreme option.

  2. Potentially we can have #[pyclass(mutable = "concurrent-error")] and #[pyclass(mutable = "rwlock")] as separate choices, but overall I'm not convinced the concurrent-error is a good option for users. The frustrating thing is that it works (relatively) well on current GIL builds. I think the "rwlock" design can work the same on GIL and nogil builds as long as we release the GIL before blocking on any #[pyclass] internal locks.

  3. Separate to the other design choices: aliasing shared references implies #[pyclass] types must be Sync. (This is actually already a small source of unsoundness in PyO3 due to ability to release GIL.) To make this change gentler for users we could have #[pyclass(unsync)] which errors on accesses from other threads (like #[pyclass(unsendable)]. Maybe better we could consider #[pyclass(mutable = "mutex")] which would be the most blocking lock form but also would protect against concurrent shared references and thus remove Sync requirement.

I want to start looking into the PyClass: Sync requirement asap.

@alex
Copy link
Contributor Author

alex commented Sep 13, 2024

Defaulting to frozen, and then requiring users to pick a strategy for mutability seems good to me.

In my experience, a lot of types don't have particularly useful concurrent semantics, and while a lock can make them safe, it can't make it sensible.

FWIW my concern is much less that locks have overhead, it's that rw-locks are full of performance cliffs and priority inversion issues (https://blog.nelhage.com/post/rwlock-contention/).

@davidhewitt davidhewitt mentioned this issue Sep 13, 2024
3 tasks
davidhewitt pushed a commit that referenced this issue Sep 15, 2024
* Update dict.get_item binding to use PyDict_GetItemRef

Refs #4265

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* test: add test for dict.get_item error path

* fix: fix logic error in dict.get_item bindings

* update: apply david's review suggestions for dict.get_item bindings

* update: create ffi::compat to store compatibility shims

* update: move PyDict_GetItemRef bindings to spot in order from dictobject.h

* build: fix build warning with --no-default-features

* doc: expand release note fragments

* fix: fix clippy warnings

* respond to review comments

* Apply suggestion from @mejrs

* refactor so cfg is applied to functions

* properly set cfgs

* fix clippy lints

* Apply @davidhewitt's suggestion

* deal with upstream deprecation of new_bound
@davidhewitt
Copy link
Member

davidhewitt commented Sep 18, 2024

As I commented in #4265, I want to suggest that a realistic goal here for the 0.23 release is to transition from "unusable and unsound" to "unusable but sound for testing".

Given that we have breaking changes already landing in 0.23, I think much better to delay the breaking changes needed above in #4265 (comment) to 0.23 (this is primarily immutable by default with the opt-in).

Thus for 0.23 I'd like to merge the PRs which make PyO3 sound under free threading but don't change existing semantics, even if those semantics are unhelpful under free threading. This is (as far as I'm aware)

I think that would be good enough to unblock the downstream ecosystem to start testing their own projects under free threading. The focus of the 0.24 release can then be the breaking changes needed to make the semantics of PyO3 actually sensible under free threading.

@mejrs
Copy link
Member

mejrs commented Sep 18, 2024

That makes sense to me, given that this is clearly documented of course. How are other libraries like Boost and Pybind11 handling this? (If they are even handling nogil at all...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants