Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99113: Make Sure the GIL is Acquired at the Right Places #104208

Merged
merged 7 commits into from
May 6, 2023

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented May 5, 2023

This is a pre-requisite for a per-interpreter GIL. Without it this change isn't strictly necessary. However, there is no downside otherwise.

(This change is broken out from gh-99114.)

@Eclips4
Copy link
Member

Eclips4 commented May 5, 2023

There's a failure, I can reproduce it on Windows. (originally taken from test_xxsubinterpreters/test_in_thread)
Minimal reproducible example:

import threading
import _xxsubinterpreters as interpreters

def foo():
    t = threading.Thread(target=interpreters.create)
    t.start()
    t.join()

foo()
foo()

Traceback:

C:\Users\KIRILL-1\CLionProjects\cpython> ./python example.py
Running Debug|x64 interpreter...
Fatal Python error: drop_gil: drop_gil: GIL is not locked
Python runtime state: initialized

Current thread 0x00001900 (most recent call first):
  <no Python frame>

@ericsnowcurrently
Copy link
Member Author

@Eclips4, do you get the same failure with #99114?

@ericsnowcurrently
Copy link
Member Author

@Eclips4, I've rebased this branch. Do you still get the crash?

@ericsnowcurrently ericsnowcurrently marked this pull request as ready for review May 5, 2023 22:03
@Eclips4
Copy link
Member

Eclips4 commented May 6, 2023

Hello Eric, sorry for the wait
Tried on commit 2404310dea7c14a59fba9a7ffae4f203005bbf4f.
Same error:

PS C:\Users\KIRILL-1\CLionProjects\cpython> ./python example.py
Running Debug|x64 interpreter...
Fatal Python error: drop_gil: drop_gil: GIL is not locked
Python runtime state: initialized

Current thread 0x00003a74 (most recent call first):
  <no Python frame>

@Eclips4
Copy link
Member

Eclips4 commented May 6, 2023

It's kinda interesting why it fails only on Windows. Maybe try build with buildbots?

@ericsnowcurrently
Copy link
Member Author

Ah, looks like I was releasing the GIL unnecessarily in new_interpreter() (in pystate.c).

@ericsnowcurrently
Copy link
Member Author

FTR, there's the stack trace before my fix:

(expand)
 	KernelBase.dll!00007ff911127b02()	Unknown
>	python312_d.dll!fatal_error_exit(int status) Line 2677	C
 	python312_d.dll!fatal_error(int fd, int header, const char * prefix, const char * msg, int status) Line 2859	C
 	python312_d.dll!_Py_FatalErrorFunc(const char * func, const char * msg) Line 2875	C
 	python312_d.dll!drop_gil(_ceval_state * ceval, _ts * tstate) Line 287	C
 	python312_d.dll!_PyEval_ReleaseLock(_ts * tstate) Line 625	C
 	python312_d.dll!_PyThreadState_Swap(pyruntimestate * runtime, _ts * newts) Line 1912	C
 	python312_d.dll!_extensions_cache_set(_object * filename, _object * name, PyModuleDef * def) Line 947	C
 	python312_d.dll!fix_up_extension(_object * mod, _object * name, _object * filename) Line 1177	C
 	python312_d.dll!_PyImport_FixupBuiltin(_object * mod, const char * name, _object * modules) Line 1310	C
 	python312_d.dll!_PySys_Create(_ts * tstate, _object * * sysmod_p) Line 3471	C
 	python312_d.dll!pycore_interp_init(_ts * tstate) Line 852	C
 	python312_d.dll!new_interpreter(_ts * * tstate_p, const PyInterpreterConfig * config) Line 2066	C
 	python312_d.dll!Py_NewInterpreterFromConfig(_ts * * tstate_p, const PyInterpreterConfig * config) Line 2101	C
 	python312_d.dll!interp_create(_object * self, _object * args, _object * kwds) Line 522	C
 	python312_d.dll!cfunction_call(_object * func, _object * args, _object * kwargs) Line 537	C
 	python312_d.dll!_PyObject_Call(_ts * tstate, _object * callable, _object * args, _object * kwargs) Line 367	C
 	python312_d.dll!PyObject_Call(_object * callable, _object * args, _object * kwargs) Line 380	C
 	python312_d.dll!_PyEval_EvalFrameDefault(_ts * tstate, _PyInterpreterFrame * frame, int throwflag) Line 3125	C
 	python312_d.dll!_PyEval_Vector(_ts * tstate, PyFunctionObject * func, _object * locals, _object * const * args, unsigned __int64 argcount, _object * kwnames) Line 1576	C
 	python312_d.dll!_PyFunction_Vectorcall(_object * func, _object * const * stack, unsigned __int64 nargsf, _object * kwnames) Line 419	C
 	python312_d.dll!_PyObject_VectorcallTstate(_ts * tstate, _object * callable, _object * const * args, unsigned __int64 nargsf, _object * kwnames) Line 92	C
 	python312_d.dll!method_vectorcall(_object * method, _object * const * args, unsigned __int64 nargsf, _object * kwnames) Line 67	C
 	python312_d.dll!_PyVectorcall_Call(_ts * tstate, _object *(*)(_object *, _object * const *, unsigned __int64, _object *) func, _object * callable, _object * tuple, _object * kwargs) Line 271	C
 	python312_d.dll!_PyObject_Call(_ts * tstate, _object * callable, _object * args, _object * kwargs) Line 354	C
 	python312_d.dll!PyObject_Call(_object * callable, _object * args, _object * kwargs) Line 380	C
 	python312_d.dll!thread_run(void * boot_raw) Line 1081	C
 	python312_d.dll!bootstrap(void * call) Line 182	C
 	[External Code]	

@Eclips4
Copy link
Member

Eclips4 commented May 6, 2023

Yeah, after last commit error will go away.

@ericsnowcurrently
Copy link
Member Author

ericsnowcurrently commented May 6, 2023

It's kinda interesting why it fails only on Windows. Maybe try build with buildbots?

That is interesting. I figured it would be worth exploring why that unexpected situation happened. Below is my analysis.

tl;dr in addition to dropping that _PyEval_ReleaseThread() call, there was a bug in _PyEval_InitGIL() that needs fixing.

(So thanks for pointing this out.)

Analysis

(expand)

The immediate problem was that we were trying to release the GIL when it wasn't held (but did already exist). This was happening, specifically, in _PyThreadState_Swap() via the pycore_interp_init() call right after init_interp_create_gil() call in new_interpreter() (in pystate.c). I expected that, when init_interp_create_gil() returned, the current thread would hold the GIL, regardless of whether or not the GIL was released earlier (which apparently we were doing).

This may mean I broke that expectation somewhere with one of my relatively recent commits (or in this PR). Or it might be a long-standing bug that I exposed with some new branch in the code.

Regardless, the failure implies broken expectations somewhere. Here are the relevant expectations I thought of, particularly relative to the failure and leading up to the call to _PySys_Create():

  1. ✅the "current" thread state when new_interpreter() (in pylifecycle.c) gets called is consistent (whether NULL or not) across platforms
  2. ✅in new_interpreter(), the "current" thread state should not be change anywhere before the _PyThreadState_SwapNoGIL()
  3. _PyThreadState_SwapNoGIL() sets the "current" thread state correctly and returns the previous one
  4. ✅after the first _PyThreadState_SwapNoGIL() call, save_tstate is consistent across platforms
  5. ✅the current thread state does not change again in new_interpreter() (except at the end if it fails)
  6. _PyEval_ReleaseLock() always releases the GIL if held (and crashes otherwise)
  7. the call to _PyConfig_Copy() does not need (or touch) the GIL
  8. ✅the call to init_interp_settings() does not need (or touch) the GIL
  9. ✅the init_interp_create_gil() call correctly sets interp->ceval.gil
  10. ✅it makes sure the GIL is held
  11. ✅when pycore_interp_init() is called, the GIL is held
  12. pycore_init_global_objects() does not release the GIL
  13. _PyGC_Init() does not release the GIL
  14. _Py_Deepfreeze_Init() does not release the GIL
  15. pycore_init_types() does not release the GIL
  16. _PyWarnings_InitState() does not release the GIL
  17. _PyAtExit_Init() does not release the GIL

I checked each of these in turn, comparing on linux (where I develop) and Windows (where the failure happened), and the expectations were valid. However, in two places the GIL changed state unexpectedly (noted above without a check mark). This only happened on Windows.

I'm pretty sure that the following happened:

  • during the _PyConfig_Copy() call, the main thread re-acquired the GIL (since we had just released it in the sub-thread)
  • during the _Py_Deepfreeze_Init() call, the main interpreter released the GIL again when it started up the second sub-thread

Why did this only happen on Windows. I expect (but have not verified) that the semantics of releasing and re-acquiring the GIL in the eval loop are different on Windows and linux.

Conclusion

All that demonstrates two issues (one of which I've already addressed):

  1. the _PyEval_ReleaseLock() call wasn't necessary (but did reveal the actual problem)
  2. in _PyEval_InitGIL(), we skip acquiring the GIL if it's already held, but we don't make sure it's held by the current thread

The solution for _PyEval_InitGIL() is to check main_interp->ceval.gil->last_holder. I'll do that.

@ericsnowcurrently ericsnowcurrently merged commit 92d8bff into python:main May 6, 2023
@ericsnowcurrently ericsnowcurrently deleted the fix-acquiring-gil branch May 6, 2023 21:59
jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this pull request May 8, 2023
…thongh-104208)

This is a pre-requisite for a per-interpreter GIL.  Without it this change isn't strictly necessary.  However, there is no real downside otherwise.
@asottile
Copy link
Contributor

@ericsnowcurrently a bit of a long shot and way late to the party -- but I've recently bisected uwsgi (which deadlocks on python 3.12 only after a reload) and the bisection claims this commit as the culprit -- you can find more analysis in unbit/uwsgi#2659

my guess without debugging much yet is last_holder isn't forksafe? (need to poke around in gdb some more though)

@asottile
Copy link
Contributor

I believe there's an oversight in this patch which changes the behaviour of PyThreadState_Swap -- previously it did not attempt to acquire the GIL at the end but now it does.

this patch "fixes" my problem:

diff --git a/Python/pystate.c b/Python/pystate.c
index f14934361da..218c08a8528 100644
--- a/Python/pystate.c
+++ b/Python/pystate.c
@@ -1919,7 +1919,7 @@ _PyThreadState_Swap(_PyRuntimeState *runtime, PyThreadState *newts)
 PyThreadState *
 PyThreadState_Swap(PyThreadState *newts)
 {
-    return _PyThreadState_Swap(&_PyRuntime, newts);
+    return _PyThreadState_SwapNoGIL(newts);
 }
 
 

asottile added a commit to asottile/cpython that referenced this pull request Aug 16, 2024
in python#104208 it was inadvertently (?) changed to acquire the GIL
@asottile
Copy link
Contributor

I opened #123079 with the potential fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants