Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns about static vs dynamic TLS in libgomp #1551

Open
h-vetinari opened this issue Nov 12, 2021 · 2 comments
Open

Concerns about static vs dynamic TLS in libgomp #1551

h-vetinari opened this issue Nov 12, 2021 · 2 comments

Comments

@h-vetinari
Copy link
Member

I just ran into some very strange errors in conda-forge/staged-recipes#16888

tensorflow_addons/metrics/tests/matthews_correlation_coefficient_test.py:21: in <module>
    from sklearn.metrics import matthews_corrcoef as sklearn_matthew
../[...]/lib/python3.8/site-packages/sklearn/__init__.py:83: in <module>
    from .utils._show_versions import show_versions
../[...]/lib/python3.8/site-packages/sklearn/utils/_show_versions.py:12: in <module>
    from ._openmp_helpers import _openmp_parallelism_enabled
E   ImportError: dlopen: cannot load any more object with static TLS

After some googling, it seems that this is a problem plaguing many users (especially with pytorch/tensorflow). I'm admittedly out of my depths with this, but the following explanation seemed pretty good:

As long as PyTorch has a dependency on libgomp.so with static TLS, there is literally nothing we can do if some of our users decide to import a bunch of third-party libraries that have dynamic TLS, without importing libgomp. They'll gobble up all of the DTV space and libgomp will fail. Note that we exacerbate the problem by depending on libraries ourselves which have dynamic TLS, so that the ceiling is lower, but if the user imports enough libraries they will hit this problem, no matter how much or little TLS we use.

This has some unfortunate side effects like how changing the import order between libraries will make the error appear / go away. Such kinds of accidents lead to the proliferation of unfortunate (because: randomly working or not) advice of e.g. to uninstall a conda-package and reinstall it from pip.

There's apparently a glibc fix for this since 2015 / glibc 2.22. Unfortunately, not even moving to CentOS 7 (#1436) would help with that, so - coming back to the original quote above - I wanted to ask:

is it possible for conda-forge to consistently enforce dynamic TLS in libgomp?

Not sure if that's possible or even a good idea, but I wanted to raise this issue so that conda-forge users don't run into such cryptic problems - if there's a way to avoid it.

@h-vetinari
Copy link
Member Author

Now cvxpy on aarch is failing broadly due to this. I'm disabling the aarch tests there. Comments or inputs welcome.

@h-vetinari
Copy link
Member Author

h-vetinari commented Jan 1, 2022

Actually, I had already raised an issue for this at the time... There's also a bit of discussion in conda-forge/cvxopt-feedstock#55, including a handy reference from Isuru.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant