Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105699: Use a Linked List for the PyModuleDef Cache #106899

Closed

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented Jul 19, 2023

This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? <wink/>

We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a linked list, using the "raw" allocator to avoid tying it to the main interpreter. We assume that there will be few enough legacy extension modules loaded that the O(n) operations won't cause too much overhead.

We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.

Copy link
Member

@Yhg1s Yhg1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the assumption that a linked list is good enough because the number of extensions is small is actually warranted, and I think we should definitely double-check the performance on systems with non-trivial numbers of modules. Not using a dict for the cache avoids some problems, but replacing it with a linear search is very likely to have a bad performance hit.

Since the problem being fixed only manifests itself with isolated subinterpreters, I would rather live with the race in 3.12 than rush in a big performance regression, even if it's just for imports of extension modules.

_PyInterpreterState_Main());
/* The lock is initialized directly with the general runtime state. */
assert(EXTENSIONS.mutex != NULL);
//assert(EXTENSIONS.head == NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover line I presume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. I had expected the assert to work (and it would in a simpler world). I was going to circle back to it but, having slept on it, don't see the need. It can be dropped.

@@ -919,167 +919,134 @@ extensions_lock_release(void)
static void
_extensions_cache_init(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this function at all, now that it's just a single assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I figured I'd end up finding a better solution than a linked list that would revive the need for this function, so I kept it around. The point is a bit moot now though. 😄

goto finally;
}
strcpy(cached->filename, PyUnicode_AsUTF8(filename));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strncpy to avoid the deprecation in some compilers (macOS IIRC?)

}
res = 0;

/* Destroy the cache data. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this DECREF found->def ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@ericsnowcurrently
Copy link
Member Author

Understood. I was looking for a quick fix and mostly played a hunch that the performance impact would be minimal. I don't think I'm wrong but agree there isn't enough time to properly validate my hypothesis. Thanks for speaking up.

I do plan on finding an alternative that doesn't rely on a Python object, to address the crasher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants