Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - crash on c++ exception when importing 2 pybind submodules built from different MSVC versions #2898

Open
Nir-Az opened this issue Mar 11, 2021 · 14 comments

Comments

@Nir-Az
Copy link

Nir-Az commented Mar 11, 2021

Issue description

Update: After diving deep in the pybind11 comments inside the code (Thanks God for it!) I am rephrasing the issue.

It looks like different MSVC versions that build modules with pybind11 interfere with each other (due to use of common resources between python modules).
The PYBIND11_INTERNALS_ID string is not constructed from a _MSC_VER and when I miss match modules build with MSVC 14.0
and MSVC >= 14.2 the ID is the same and I get a crash

Here is the crash stack

image

image

The "registered_exception_translators" list is not empty, but once you try to access the translator you get access violation.

I saw the "PYBIND11_COMPILER_TYPE" definition can be override,
I guess what I see here is that mismatch between MSVC versions also cause problems and perhaps should be considered as a different ABI.

Question:
What are the benefits of using the same "Internals" between modules? is it memory consumption?
It seems like if I force override the ABI version it works (add_definitions(-DPYBIND11_COMPILER_TYPE="_testing")

This is my workaround for now, I can suggest a PR that integrate MSC version as part of the ABI_ID..

Thanks

Thanks
Nir

@Nir-Az Nir-Az changed the title [BUG] [Possible Issue] - exception not shown or catched when importing 2 pybind submodules Mar 11, 2021
@Nir-Az Nir-Az changed the title [Possible Issue] - exception not shown or catched when importing 2 pybind submodules [Possible Issue] - exception not shown or catched when importing 2 pybind submodules build from different MSVC versions Mar 12, 2021
@Nir-Az Nir-Az changed the title [Possible Issue] - exception not shown or catched when importing 2 pybind submodules build from different MSVC versions [Possible Issue] - crash on c++ exception when importing 2 pybind submodules build from different MSVC versions Mar 12, 2021
@Nir-Az Nir-Az changed the title [Possible Issue] - crash on c++ exception when importing 2 pybind submodules build from different MSVC versions [Possible Issue] - crash on c++ exception when importing 2 pybind submodules built from different MSVC versions Mar 14, 2021
@Nir-Az Nir-Az changed the title [Possible Issue] - crash on c++ exception when importing 2 pybind submodules built from different MSVC versions [BUG] - crash on c++ exception when importing 2 pybind submodules built from different MSVC versions Mar 14, 2021
@ibell
Copy link
Contributor

ibell commented Mar 30, 2021

Maybe this is related: #1562 ?

@Nir-Az
Copy link
Author

Nir-Az commented Feb 13, 2023

Anything new regarding it?

I currently solve it by adding -DPYBIND11_COMPILER_TYPE="<unique_abi_string>" to every module I add.
BTW I get the same issue with working with 2 modules of the same MSVC.
They share resources and the objects multi_map is not thread_safe, so the modules interfere with each other.

@pwuertz
Copy link
Contributor

pwuertz commented Jul 25, 2023

Also got hit by this bug. The involved pybind11 modules were TensorRT, which NVidia built with MSVC 2017, and other modules built via GitHub Actions, currently on MSVC 2022. All modules end up accessing the same global state with PYBIND11_INTERNALS_ID = "__pybind11_internals_v4_msvc__", while being ABI incompatible.

This is really bad, and should be fixed ASAP or we'll keep running into ABI mismatch induced memory corruptions and crashes for the foreseeable future.

Is there a good reason why pybind11 is doing state sharing by default? I could imagine someone designing a multi-module-project with some common types, but state sharing might as well be an opt-in-feature for this specific case, not a default that is prone to all sorts of ABI pitfalls.

What about @Nir-Az's request of adding _MSC_VER to the PYBIND11_INTERNALS_ID? What's the hold-up here?

@pwuertz
Copy link
Contributor

pwuertz commented Jul 26, 2023

Could we kindly ask for some feedback from the authors of the PYBIND11_INTERNALS_ID mechanism? @henryiii @rwgk @wjakob

@Nir-Az
Copy link
Author

Nir-Az commented Jul 26, 2023

This is a very hard bug IMO.
We had to do an ugly patch to resolve it..

@rwgk
Copy link
Collaborator

rwgk commented Jul 26, 2023

I can suggest a PR that integrate MSC version as part of the ABI_ID.

That would be great. I'm guessing this is a really messy problem to get right. It needs understanding MSVC ABI compatibilities. Another problem is that it is very difficult to test.

Do you know if MSVC has something similar to __GXX_ABI_VERSION__? That defines the gcc (and clang I think) ABI version.

@Nir-Az
Copy link
Author

Nir-Az commented Jul 26, 2023

It's been 2.5 years since my comment :)
Why not just don't share resources between libraries?
Or default it to off and let user choose if he wants it and understand the affects?
I was surprised to see this feature.
It will save lots of pain :)

@rwgk
Copy link
Collaborator

rwgk commented Jul 26, 2023

To the best of my incomplete understanding (please correct me if I'm wrong):

  • PyPI wheels don't have a concept of ABI versions (i.e. providers of pybind11-based extensions cannot upload multiple wheels).

  • There will be a significant fraction of use cases that will stop working. Is that better or worse? Who knows?

The current implementation is unlikely to get changed unless someone brave wants to throw themselves into that most likely thankless battle, that will most likely drag out for years until everybody is either happy or adapted themselves enough.

@pwuertz
Copy link
Contributor

pwuertz commented Jul 27, 2023

Do you know if MSVC has something similar to GXX_ABI_VERSION? That defines the gcc (and clang I think) ABI version.

When talking about ABI compatibility at the DLL-boundary, not really. C++ binary compatibility between Visual Studio versions mentions that it is possible to mix outputs from different compilers at pre-linker-stage, but strict version matching down to the minor number is required at the post-linker-stage. So _MSC_VER looks like an appropriate guard for the job.

But also, even with a matching toolchain the C++ ABI may break for other reasons as well, e.g. compiler flags. So when mixing build products from multiple, unknown sources, the only real solution AFAIK is to not use C++ across ABI boundaries at all (which Msft also states in Portability at ABI boundaries).

That being said, empirically, people compiling against pre-built DLLs often seem to get away with matching just the MSVC-year (first three _MSC_VER digits).

PyPI wheels don't have a concept of ABI versions (i.e. providers of pybind11-based extensions cannot upload multiple wheels)

They do have a concept of ABI versions, namely C-ABI & CPython version & architecture, which is luckily a solved problem in PyPI & CPython nowadays. What we as module authors are doing behind the CPython-interface is of no concern to PyPI & CPython. PyPI can't help us with issues stemming from private implementation details like C++-ABI, Rust-ABI, CUDA version, device-driver-ABIs, etc.

PyPI is not a C++ package manager like Conan or Vcpkg. And even those will often do full library rebuilds rather than using pre-built binaries for slight changes in toolchain or flags.

There will be a significant fraction of use cases that will stop working. Is that better or worse? Who knows?

Well, we know that 100% of use cases have been knowingly exposed to memory corruption and crashes. So yes, not having done that by default clearly would have been the better option.

The current implementation is unlikely to get changed unless someone brave wants to throw themselves into that most likely thankless battle, that will most likely drag out for years until everybody is either happy or adapted themselves enough.

Sure, I guess we understand that what's done is done and that pybind11 doesn't want to take away other peoples' candy, as unhealthy as it may be. So the best we can do in this situation is to add _MSC_VER to PYBIND11_INTERNALS_ID.

@rwgk
Copy link
Collaborator

rwgk commented Jul 27, 2023

Sure, I guess we understand that what's done is done and that pybind11 doesn't want to take away other peoples' candy, as unhealthy as it may be.

Thanks a lot Peter for the very careful explanation!

Eye opening for me (I'm not very familiar with the MSVC specifics): I thought we're only a little bit our of the safe zone, but now I'm thinking we're out by a lot. I'm actually surprised now that there weren't more complains.

So the best we can do in this situation is to add _MSC_VER to PYBIND11_INTERNALS_ID.

Agreed! That sounds like a good change to include in pybind11 v2.12.

@rwgk
Copy link
Collaborator

rwgk commented Aug 3, 2023

@pwuertz Looking at this again, I realize I didn't say directly: Could you please send a PR? (I believe realistically that's the only way something will change here: someone with a vested interest & relevant expertise taking action.)

@pwuertz
Copy link
Contributor

pwuertz commented Aug 4, 2023

@rwgk Right, PR proposal is out: #4779.

As a simple test I rebuilt one of my pybind11 modules with this PR on GitHub-Actions. The pybind11-internals-key now yields "__pybind11_internals_v4_msvc_mscver1935__".

@rwgk
Copy link
Collaborator

rwgk commented Aug 4, 2023

Thanks a lot! Just to point out, the internals key is also shown in the pybind11 pytest output, e.g. (from a #4779 GHA log):

============================= test session starts =============================
 platform win32 -- Python 3.6.8, pytest-7.0.0, pluggy-1.0.0
 C++ Info: MSVC 192930151 C++14 __pybind11_internals_v4_msvc_mscver1929_debug__ PYBIND11_SIMPLE_GIL_MANAGEMENT=True

@isuruf
Copy link
Contributor

isuruf commented Nov 28, 2023

@pwuertz, TensorRT was using /MT (which is a strange choice IMO) whereas all the others are using /MD. I've sent #4953 to fix this.

mhochsteger added a commit to NGSolve/pybind11 that referenced this issue Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants