Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing rpath for libmkl_intel_thread? #6423

Closed
kalefranz opened this issue Oct 5, 2017 · 33 comments
Closed

missing rpath for libmkl_intel_thread? #6423

kalefranz opened this issue Oct 5, 2017 · 33 comments

Comments

@kalefranz
Copy link

From @stevengj on October 4, 2017 18:23

In Julia, which loads libpython as a shared library (installed via conda by default), we are suddenly (with a recent conda upgrade) getting MKL failures on MacOS because it can't find libmkl_intel_thread.dylib. Manually setting LD_LIBRARY_PATH to Conda's usr/lib directory (where that library is found) fixes the problem, but that didn't used to be required, and isn't required if I run the python executable directly.

My guess is that there is a missing shared-library dependency somewhere — something in numpy should have explicitly linked -lmkl_intel_thread but didn't do so? Or maybe a missing -rpath linker argument?

For example, we get this when plotting via matplotlib (JuliaPy/PyPlot.jl#315), but also when calling any numpy linear-algebra function:

julia> using PyCall

julia> pyimport("numpy.linalg")["inv"](rand(100,100))
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.

(This just calls numpy.linalg.inv on a random 100x100 numpy matrix. Under the hood, it is calling PyImport_ImportModule("numpy.linalg") in libpython, etcetera, using Python's standard C embedding API.)

Copied from original issue: conda/conda#6074

@kalefranz
Copy link
Author

From @stevengj on October 4, 2017 18:40

otool -L usr/pkgs/numpy-1.11.1-py27_0/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so gives

	@rpath/libmkl_intel_lp64.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_intel_thread.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_core.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version

so it looks like libmkl_intel_thread.dylib is explicitly linked.

If I run otool -l on either the python executable or on libpython2.7.dylib, I get two LC_RPATH commands to add paths to the library search path:

Load command 15
          cmd LC_RPATH
      cmdsize 272
         path /Users/stevenj/.julia/v0.6/Conda/deps/usr/lib (offset 12)
Load command 18
          cmd LC_RPATH
      cmdsize 40
         path @loader_path/../lib/ (offset 12)

Notice that it has both the absolute path of my conda /usr/lib directory and also a path relative to @loader_path. Both of these seem correct.

Clearly, there's something I don't understand about how MacOS resolves shared-library search paths. Something is not getting set correctly when we open libpython dynamically via dlopen.

But what has changed recently in Conda's NumPy package that would cause this to suddenly appear? Are you linking differently than you used to? Did you only recently start using @rpath?

@kalefranz
Copy link
Author

From @stevengj on October 5, 2017 15:15

It seems like the problem is specific to libmkl_intel_thread(?), since if I manually dlopen that library (and the libompi5 OpenMP library it requires), then NumPy works fine even without setting LD_LIBRARY_PATH:

julia> Libdl.dlopen("/Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libiomp5.dylib")
       Libdl.dlopen("/Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libmkl_intel_thread.dylib")

julia> .... call numpy.linalg.inv, it works ....

(e.g. there is no problem with libmkl_core.dylib, which is also linked by numpy.)

@kalefranz
Copy link
Author

From @stevengj on October 5, 2017 15:21

cc @asmeurer, who has worked on rpath issues in the past (conda/conda-build#312). (Should I have reported this in conda/conda-build instead? Sorry!)

@msarahan
Copy link
Contributor

msarahan commented Oct 5, 2017

Seems related to #6401

@iamed2
Copy link

iamed2 commented Oct 5, 2017

@msarahan That seems to be related to a Python 3-only change, while this problem occurs with (at least) Python 2.7.

@stevengj
Copy link

stevengj commented Oct 5, 2017

@msarahan, note also that the solution in #6401 was to dlopen libpython with RTLD_GLOBAL, but we already use RTLD_LAZY|RTLD_DEEPBIND|RTLD_GLOBAL. (And it was working fine until very recently, with both Python 2 and 3.)

@kalefranz
Copy link
Author

@stevengj Could you provide output for

otool -L /Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libiomp5.dylib
otool -L /Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libmkl_intel_thread.dylib

@msarahan
Copy link
Contributor

msarahan commented Oct 5, 2017

Older builds used this patch: https://github.com/AnacondaRecipes/numpy-recipe/blob/master/recipe/dlopenflags.patch

Based on advice from Intel, we dropped that patch for the latest packages. Maybe that's related?

Our 2018 MKL packages come more directly from Intel. We take their packages and apply metadata to them to indicate that we repackage them.

Additionally, on Intel's advice, we have switched site.cfg from explicit specification:

[mkl]
library_dirs = @PREFIX@/lib
include_dirs = @PREFIX@/include
lapack_libs = mkl_lapack95_lp64
mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5

to their central runtime, which is controllable via environment variables:

[mkl]
mkl_libs = mkl_rt
lapack_libs = mkl_rt

@stevengj
Copy link

stevengj commented Oct 5, 2017

@kalefranz,

$ otool -L /Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libiomp5.dylib
/Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libiomp5.dylib:
	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1225.1.1)

$ otool -L /Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libmkl_intel_thread.dylib
/Users/stevenj/.julia/v0.6/Conda//deps/usr/lib/libmkl_intel_thread.dylib:
	@rpath/libmkl_intel_thread.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)

@montyvesselinov
Copy link

This bug impacts also the pyplot performance:

https://travis-ci.org/madsjulia/Mads.jl/jobs/281775921

@oleksandr-pavlyk
Copy link

@msarahan I'm not sure if you're quoting the [mkl] section of new site.cfg in full, but would like to state that library_dirs and include_dir need to be there.

I'll try to see locally if the observed errors may be due to removal of dlopenflags.patch. I'll report back with my findings.

@msarahan
Copy link
Contributor

msarahan commented Oct 6, 2017

Thanks @oleksandr-pavlyk - library_dirs and include_dir are there, but they're added later in our script. Don't pay too much attention to that.

The options I see here, dependeing on @oleksandr-pavlyk's findings are:

  1. reinstate that patch
  2. switch back to explicit link-line advisor linking instead of mkl_rt.

We do not have time to explore these options. If anyone in the julia world would like to explore these options, our recipes for these are at:

https://github.com/AnacondaRecipes/aggregate/tree/master/intel_repack
https://github.com/AnacondaRecipes/numpy-feedstock/tree/master/recipe

We're definitely open to changes that would make things work better for everyone.

@oleksandr-pavlyk
Copy link

oleksandr-pavlyk commented Oct 6, 2017

I tried to reproduce the issue on Mac, using Julia 0.6.0 DMG file from https://julialang.org/downloads/

I am unable to even install "PyCall" seeing

julia> Pkg.init()
INFO: Initializing package repository /nfs/site/home/opavlyk/.julia/v0.6
INFO: Cloning METADATA from https://github.com/JuliaLang/METADATA.jl
ERROR: GitError(Code:ERROR, Class:Zlib, error reading from the zlib stream)

I am able to clone METADATA from the command line though.

Since I am unable to reproduce the issue, I can only suggest to try the call with Intel distribution for Python, which you can install into the existing Miniconda installation using

conda create -n idp -c intel numpy

then use /path/to/miniconda/envs/idp/bin/python in "PyCall".

The point of this experiment is to confirm whether removal of dlopenflags.patch is the culprit behind this. This patch is also absent from IDP.

Alternatively, you could edit site-packages/numpy/__init__.py in the miniconda environment you are currently using and add

import ctypes
_old_rtld = sys.getdlopenflags()
sys.setdlopenflags(_old_rtld | ctypes.RTLD_GLOBAL)

at the top of the file before from . import core line L#156.

Please let us know if this fixes your issue.

@stevengj
Copy link

stevengj commented Oct 6, 2017

@oleksandr-pavlyk, sorry to hear that you ran into an error with Pkg.init; it's a bit bizarre, I've installed Julia on dozens of MacOS systems and have never seen an error like that one, but maybe someone else will chime in.

I can confirm that the Intel Distribution for Python (via the command you suggested) eliminates the problem.

Editing site-packages/numpy/__init__.py to add RTLD_GLOBAL as you suggest does not fix the problem, however.

@oleksandr-pavlyk
Copy link

@stevengj In the interest of reproducibility of the issue, could you specify output of conda list --explicit of the environment where you are experiencing the reported issue. This helps pin down the version of NumPy used, the version of Python, the version of MKL, etc.

I created conda create -n gh6423 numpy, did source activate gh6423, and ran

find $CONDA_PREFIX/lib/python3.6/site-packages/numpy/linalg -name "*.so" -exec otool -L {} \;

with the following output

/scratch/miniconda3/envs/gh6423/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-darwin.so:
        @rpath/libmkl_rt.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
/scratch/miniconda3/envs/gh6423/lib/python3.6/site-packages/numpy/linalg/lapack_lite.cpython-36m-darwin.so:
        @rpath/libmkl_rt.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)

which shows that the recent version of NumPy's linear algebra code is correctly linked against mkl_rt library.

It would be best if the issue was reproducible though.

@stevengj
Copy link

stevengj commented Oct 10, 2017

My output is:

stevenj$ ~/.julia/v0.6/Conda/deps/usr/bin/conda list --explicit
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
@EXPLICIT
https://repo.continuum.io/pkgs/main/osx-64/appnope-0.1.0-py27hb466136_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/asn1crypto-0.22.0-py27h61af4a7_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/backports-1.0-py27hb4f9756_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/backports.shutil_get_terminal_size-1.0.0-py27hc9115de_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/backports_abc-0.5-py27h6972548_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/bleach-2.0.0-py27ha7d1710_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ca-certificates-2017.08.26-ha1e5d58_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/certifi-2017.7.27.1-py27h482ffc0_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/cffi-1.10.0-py27haac214c_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/chardet-3.0.4-py27h2842e91_1.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/conda-4.3.27-py27h94ab009_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/conda-env-2.6.0-h36134e3_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/configparser-3.5.0-py27hc7edf1b_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/cryptography-2.0.3-py27hab69567_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/cycler-0.10.0-py27hfc73c78_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/dbus-1.10.22-h50d9ad6_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/decorator-4.1.2-py27h9f877ea_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/entrypoints-0.2.3-py27hd680fb1_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/enum34-1.1.6-py27hf475452_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/expat-2.2.4-h8f26bf8_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/freetype-2.8-h143eb01_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/functools32-3.2.3.2-py27h8ceab06_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/get_terminal_size-1.0.0-h7520d66_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/gettext-0.19.8.1-hb0f4f8b_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/glib-2.53.6-ha08cb78_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/html5lib-0.999999999-py27hec7e2bc_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/icu-58.2-hea21ae5_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/idna-2.6-py27hedea723_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/intel-openmp-2018.0.0-hdd0ccc9_7.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ipaddress-1.0.18-py27h5b9a5b9_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ipykernel-4.6.1-py27h1e70a78_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ipython-5.4.1-py27h2b3d779_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ipython_genutils-0.2.0-py27h8b9a179_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ipywidgets-7.0.0-py27h3e52029_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jinja2-2.9.6-py27h92590e2_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jpeg-9b-haccd157_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jsonschema-2.6.0-py27hd9b497e_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jupyter-1.0.0-py27hec63c99_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jupyter_client-5.1.0-py27hfaf569a_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jupyter_console-5.2.0-py27h9702a86_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/jupyter_core-4.3.0-py27hd5161ba_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libcxx-4.0.1-h579ed51_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libcxxabi-4.0.1-hebd6815_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libedit-3.1-hb4e282d_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libffi-3.2.1-hd939716_3.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libgfortran-3.0.1-h93005f0_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libiconv-1.15-h99df5da_5.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libpng-1.6.32-hce72d48_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/libsodium-1.0.13-hba5e272_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/markupsafe-1.0-py27hd3c86fa_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/matplotlib-2.0.2-py27h2e09848_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/mistune-0.7.4-py27h1658d75_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/mkl-2018.0.0-hc285769_4.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/nbconvert-5.3.1-py27h6455e4c_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/nbformat-4.4.0-py27hddc86d0_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ncurses-6.0-ha932d30_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/notebook-5.0.0-py27h5f5981d_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/numpy-1.13.1-py27hd567e90_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/openssl-1.0.2l-h57f3a61_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/packaging-16.8-py27h24b219a_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pandoc-1.19.2.1-ha5e8f32_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pandocfilters-1.4.2-py27hed78c4e_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/path.py-10.3.1-py27h5e25276_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pathlib2-2.3.0-py27he09da1e_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pcre-8.41-h29eefc5_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pexpect-4.2.1-py27hc4e4961_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pickleshare-0.7.4-py27h37e3d41_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pip-9.0.1-py27h61def0c_3.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/prompt_toolkit-1.0.15-py27h4a7b9c2_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ptyprocess-0.5.2-py27h70f6364_0.tar.bz2
https://repo.continuum.io/pkgs/free/osx-64/pyasn1-0.2.3-py27_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pycosat-0.6.2-py27h085d4cc_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pycparser-2.18-py27h0d28d88_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pycrypto-2.6.1-py27h4efa152_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pygments-2.2.0-py27h1a556bb_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pyopenssl-17.2.0-py27h732fe57_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pyparsing-2.2.0-py27h5bb6aaf_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pyqt-5.6.0-py27hf21fe59_6.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pysocks-1.6.7-py27h1cff6a6_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/python-2.7.14-hed931fe_16.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/python-dateutil-2.6.1-py27hd56c96b_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/python.app-2-py27h48d88ae_5.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pytz-2017.2-py27hb891d23_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pyyaml-3.12-py27ha7932d0_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/pyzmq-16.0.2-py27he61c07e_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/qt-5.6.2-h9975529_14.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/qtconsole-4.3.1-py27hdc90b4f_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/readline-7.0-h81b24a6_3.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/requests-2.18.4-py27h9b2b37c_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ruamel_yaml-0.11.14-py27h31666c4_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/scandir-1.5-py27h77d1c80_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/scipy-0.19.1-py27hf01dd8f_3.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/setuptools-36.5.0-py27h2a45cec_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/simplegeneric-0.8.1-py27h6db5e31_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/singledispatch-3.4.0.3-py27he22c18d_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/sip-4.18.1-py27h6300f65_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/six-1.10.0-py27h47fc262_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/sqlite-3.20.1-h900c3b0_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/ssl_match_hostname-3.5.0.1-py27h8780752_2.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/subprocess32-3.2.7-py27h24b2887_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/terminado-0.6-py27he40bf16_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/testpath-0.3.1-py27h72d81a5_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/tk-8.6.7-hcdce994_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/tornado-4.5.2-py27h29aec9e_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/traitlets-4.3.2-py27hcf08151_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/urllib3-1.22-py27hc3787e9_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/wcwidth-0.1.7-py27h817c265_0.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/webencodings-0.5.1-py27h19a9f58_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/wheel-0.29.0-py27h84bd1c0_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/widgetsnbextension-3.0.2-py27h56f70de_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/yaml-0.1.7-hff548bb_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/zeromq-4.2.2-h131e0f7_1.tar.bz2
https://repo.continuum.io/pkgs/main/osx-64/zlib-1.2.11-h60db283_1.tar.bz2

@stevengj
Copy link

stevengj commented Oct 10, 2017

The issue seems to be quite reproducible — multiple people using Anaconda via PyCall almost simultaneously reported the issue after a recent conda upgrade, and I immediately got the same issue on my MacOS machine by updating conda.

The issue is not that NumPy isn't linked to mkl_rt. That looks fine for me too:

stevenj$ find ~/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/numpy/linalg -name "*.so" -exec otool -L {} \;
/Users/stevenj/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/numpy/linalg/_umath_linalg.so:
	@rpath/libmkl_rt.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
/Users/stevenj/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so:
	@rpath/libmkl_rt.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)

It must be something else in how the shared libraries are set up in Anaconda (compared to the Intel distribution).

@mingwandroid
Copy link

mingwandroid commented Oct 10, 2017

I know what the problem is, it's because of how we link to mkl (single dynamic shared) and how mkl then uses LC_LOAD_DYLIB with @rpath to look for its own libraries but from C/C++ code (dlopen).

Here are two workarounds (you need Xcode or Apple's Command Line Tools installed .. or the conda package cctools, in which case use x86_64-apple-darwin13.4.0-install_name_tool) while we work out what to do:

Horrible (but will survive updates and reinstalls of mkl but not of julia):

install_name_tool -add_rpath ${HOME}/.julia/v0.6/Conda/deps/usr/lib /Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia

Much less horrible but will not survive updates to mkl.

install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib ${HOME}/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib

Adjust paths to taste.

Cheers!

@stevengj
Copy link

@mingwandroid, thanks! Is there any chance of some form of the second solution being incorporated into Anaconda's build? What is the Intel Distribution for Python doing differently?

@mingwandroid
Copy link

mingwandroid commented Oct 10, 2017

The Intel Distribution for Python is not linking dynamically to a single shared library. They 'bake' in all the choices about e.g. openmp vs not openmp at numpy's build-time. This is not the recommended approach. My 2nd solution is (probably) something we can roll out into an update to our MKL package(s) but I need to verify it as OK with Intel first (probably).

@mingwandroid
Copy link

@stevengj, if you want I can give you a detailed analysis?

@stevengj
Copy link

stevengj commented Oct 10, 2017

I would love to hear the details of what is going on, though of course updates to the MKL packages are mainly up to you guys. (Is it possible to disable the use of OpenMP in NumPy at runtime, then?)

@stevengj
Copy link

I can verify that the install_name_tool -change workaround solves the Julia problem, thanks!

@mingwandroid
Copy link

mingwandroid commented Oct 11, 2017

Thank you for the feedback @stevengj. Here is my analysis:

  1. Install Julia:
curl -SLO https://julialang-s3.julialang.org/bin/osx/x64/0.6/julia-0.6.0-osx10.7+.dmg
hdiutil attach julia-0.6.0-osx10.7+.dmg
cp -rf /Volumes/Julia-0.6.0/Julia-0.6.app /Applications
  1. Clear out any cached Julia packages:
rm -rf ${HOME}/.julia
  1. Install Conda.jl and PyCall and verify the problem:
/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -e 'Pkg.add("Conda"); Pkg.add("PyCall"); using PyCall; pyimport("numpy.linalg")["inv"](rand(100,100))'

.. should output:

INFO: Initializing package repository /Users/rdonnelly/.julia/v0.6
INFO: Cloning METADATA from https://github.com/JuliaLang/METADATA.jl
INFO: Cloning cache of BinDeps from https://github.com/JuliaLang/BinDeps.jl.git
INFO: Cloning cache of Compat from https://github.com/JuliaLang/Compat.jl.git
INFO: Cloning cache of Conda from https://github.com/JuliaPy/Conda.jl.git
INFO: Cloning cache of JSON from https://github.com/JuliaIO/JSON.jl.git
INFO: Cloning cache of SHA from https://github.com/staticfloat/SHA.jl.git
INFO: Cloning cache of URIParser from https://github.com/JuliaWeb/URIParser.jl.git
INFO: Installing BinDeps v0.7.0
INFO: Installing Compat v0.33.0
INFO: Installing Conda v0.7.0
INFO: Installing JSON v0.15.1
INFO: Installing SHA v0.5.1
INFO: Installing URIParser v0.2.0
INFO: Building Conda
INFO: Package database updated
INFO: Cloning cache of MacroTools from https://github.com/MikeInnes/MacroTools.jl.git
INFO: Cloning cache of PyCall from https://github.com/JuliaPy/PyCall.jl.git
INFO: Installing MacroTools v0.3.7
INFO: Installing PyCall v1.15.0
INFO: Building Conda
INFO: Building PyCall
INFO: Using the Python distribution in the Conda package by default.
To use a different Python version, set ENV["PYTHON"]="pythoncommand" and re-run Pkg.build("PyCall").
INFO: Downloading miniconda installer ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21.3M  100 21.3M    0     0  6566k      0  0:00:03  0:00:03 --:--:-- 6564k
INFO: Installing miniconda ...
Python 2.7.13 :: Continuum Analytics, Inc.
Warning: 'defaults' already in 'channels' list, moving to the top
INFO: PyCall is using /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/bin/python (Python 2.7.13) at /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/bin/python, libpython = /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libpython2.7
INFO: /Users/rdonnelly/.julia/v0.6/PyCall/deps/deps.jl has been updated
INFO: /Users/rdonnelly/.julia/v0.6/PyCall/deps/PYTHON has been updated
INFO: Package database updated
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.

Ok, great, we've been able to reproduce this.

  1. Investigate which Miniconda and conda packages Conda.jl installed:
${HOME}/.julia/v0.6/Conda/deps/usr/bin/conda list --show-channel-urls
# packages in environment at /Users/rdonnelly/.julia/v0.6/Conda/deps/usr:
#
asn1crypto                0.22.0                   py27_0    defaults
cffi                      1.10.0                   py27_0    defaults
conda                     4.3.27           py27h94ab009_0    defaults
conda-env                 2.6.0                h36134e3_0    defaults
cryptography              1.8.1                    py27_0    defaults
enum34                    1.1.6                    py27_0    defaults
idna                      2.5                      py27_0    defaults
intel-openmp              2018.0.0             hdd0ccc9_7    defaults
ipaddress                 1.0.18                   py27_0    defaults
libgfortran               3.0.1                h93005f0_2    defaults
mkl                       2018.0.0             hc285769_4    defaults
numpy                     1.13.3           py27h62f9060_0    defaults
openssl                   1.0.2l                        0    defaults
packaging                 16.8                     py27_0    defaults
pip                       9.0.1                    py27_1    defaults
pycosat                   0.6.2                    py27_0    defaults
pycparser                 2.17                     py27_0    defaults
pyopenssl                 17.0.0                   py27_0    defaults
pyparsing                 2.1.4                    py27_0    defaults
python                    2.7.13                        0    defaults
readline                  6.2                           2    defaults
requests                  2.14.2                   py27_0    defaults
ruamel_yaml               0.11.14                  py27_1    defaults
setuptools                27.2.0                   py27_0    defaults
six                       1.10.0                   py27_0    defaults
sqlite                    3.13.0                        0    defaults
tk                        8.5.18                        0    defaults
wheel                     0.29.0                   py27_0    defaults
yaml                      0.1.6                         0    defaults
zlib                      1.2.8                         3    defaults

(observe a mix of old and new software, this is not the cause of this problem, but it is sub-optimal. AFAICT, Conda.jl downloads the old Miniconda, then updates conda then installs some packages. I'd feel more comfortable if it either download the new Miniconda or else did a conda update --all before installing stuff, anyway that's tangential to this issue).

  1. Use the fairly good tools that Apple provides to debug this:
DYLD_PRINT_LIBRARIES=1 \
DYLD_PRINT_BINDINGS=1 \
DYLD_PREBIND_DEBUG=1 \
DYLD_PRINT_INITIALIZERS=1 \
DYLD_PRINT_APIS=1 \
/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -e 'using PyCall; pyimport("numpy.linalg")["inv"](rand(100,100))'

.. and here, near the end of the 7500 odd lines of output we see the problem:

dyld: bind: libmkl_core.dylib:0x122D52288 = libdyld.dylib:dyld_stub_binder, *0x122D52288 = 0x7FFF967D2168
dyld_image_path_containing_address(0x1214f9000)
  dlopen(/Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_core.dylib) ==> 0x7f826f393eb0
dladdr(0x7fff9684f180, 0x7fff5bdb8ab0)
dlopen(/Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib, 0x00000009)
dyld: loaded: /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib
dlclose(), found unused image 0x7f826c918b30 libmkl_intel_thread.dylib
dlclose(), deleting 0x7f826c918b30 libmkl_intel_thread.dylib
dyld: unloaded: /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib
  dlopen() failed, error: 'dlopen(/Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib, 9): Library not loaded: @rpath/libiomp5.dylib
  Referenced from: /Users/rdonnelly/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib
  Reason: image not found'
dlerror()
dlopen(/Applications/Julia-0.6.app/Contents/Resources/julia/bin/libmkl_intel_thread.dylib, 0x00000009)
  dlopen() failed, error: 'dlopen(/Applications/Julia-0.6.app/Contents/Resources/julia/bin/libmkl_intel_thread.dylib, 9): image not found'
dlerror()
dlopen(libmkl_intel_thread.dylib, 0x00000009)
  dlopen() failed, error: 'dlopen(libmkl_intel_thread.dylib, 9): image not found'
dlerror()
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.

So Apple's dynamic loader failed to load libiomp5.dylib from @rpath/libiomp5.dylib.

  1. Why does Intel's version work while ours fails? I pored over logs carefully looking for differences (ok, I didn't I threw two logs into Beycond Compare instead) and found the culprit:

.. our numpy/core/multiarray.so links to the single dynamic mkl_rt:

otool -l ${HOME}/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/numpy/core/multiarray.so
    Load command 9
              cmd LC_LOAD_DYLIB
          cmdsize 48
             name @rpath/libmkl_rt.dylib (offset 24)
       time stamp 2 Thu Jan  1 01:00:02 1970
          current version 0.0.0
    compatibility version 0.0.0
    Load command 10

.. Intel's numpy/core/multiarray.so links to the multiple dynamic library set instead:

otool -l /opt/intel/intelpython3/envs/conda_jl/lib/python2.7/site-packages/numpy/core/multiarray.so
Load command 8
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name @rpath/libmkl_intel_lp64.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 0.0.0
compatibility version 0.0.0
Load command 9
          cmd LC_LOAD_DYLIB
      cmdsize 64
         name @rpath/libmkl_intel_thread.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 0.0.0
compatibility version 0.0.0
Load command 10
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name @rpath/libmkl_core.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 0.0.0
compatibility version 0.0.0
Load command 11
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libiomp5.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 5.0.0
compatibility version 5.0.0
  1. Analysis:

What has happened here is that we has followed the recommended (and I agree, best) advice on how to link numpy and picked the 'single dynamic library' route. Unfortunately this means that Apple's dyld in no longer responsible for managing (updating and passing along) the @rpath variable through the load hierarchy of shared libraries identified via LC_LOAD_DYLIB and LC_RPATH commands it parses, instead, Intel's code in libmkl_rt.dylib directly calls dlopen. This means that the only program that gets a chance to set @rpath as seen by libmkl_intel_thread.dylib is the julia executable itself. We can test this theory by adding the missing @rpath to it:

install_name_tool -add_rpath ${HOME}/.julia/v0.6/Conda/deps/usr/lib /Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia

and what I suspect is the correct (or at least best) fix:

install_name_tool -change @rpath/libiomp5.dylib @loader_path/libiomp5.dylib ${HOME}/.julia/v0.6/Conda/deps/usr/lib/libmkl_intel_thread.dylib

@tomashek
Copy link

@mingwandroid, Which version of Intel python did you find that multiarray.so links explicitly to threading implementations other than mkl_rt? Here is what I get after installing intelpython27-2018.0.018. Same with intelpython35-2018.0.018:

$ otool -l pkgs/numpy-1.13.1-py27_intel_15/lib/python2.7/site-packages/numpy/core/multiarray.so
Load command 12
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libmkl_rt.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 0.0.0
compatibility version 0.0.0
Load command 13
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 1226.10.1
compatibility version 1.0.0
Load command 14
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libimf.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 0.0.0
compatibility version 0.0.0
Load command 15
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libsvml.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 0.0.0
compatibility version 0.0.0
Load command 16
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libirng.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 0.0.0
compatibility version 0.0.0
Load command 17
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libiomp5.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 5.0.0
compatibility version 5.0.0
Load command 18
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name /usr/lib/libc++.1.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 120.1.0
compatibility version 1.0.0
Load command 19
          cmd LC_LOAD_DYLIB
      cmdsize 48
         name @rpath/libintlc.dylib (offset 24)
   time stamp 2 Wed Dec 31 18:00:02 1969
      current version 1.13.0
compatibility version 1.0.0

@mingwandroid
Copy link

Seems I drove IDP badly and the numpy I was looking at was from the old defaults channel:

numpy                     1.13.1                   py36_0    defaults

Adding -c intel fixed that and I get the same otool -l output as you do. You link directly to @rpath/libiomp5.dylib here, so I guess that means libmkl_rt.dylib doesn't dynamically select that?

@oleksandr-pavlyk
Copy link

oleksandr-pavlyk commented Oct 11, 2017

@mingwandroid The reason why iomp5 is linked directly is likely to not have anything to do with use of MKL. It may be due to use of omp-specific pragmas, but I would need to investigate. Will report back.

mkl_rt will dynamically choose between threading layers depending on the value of MKL_THREADING_LAYER environmental variable, which make take values sequential, intel, tbb among others.

@mingwandroid
Copy link

The test package has been released to c3i_test2:

~/.julia/v0.6/Conda/deps/usr/bin/conda install -c c3i_test2 mkl

.. then test with:

/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -e 'Pkg.add("Conda"); Pkg.add("PyCall"); using PyCall; pyimport("numpy.linalg")["inv"](rand(100,100))'

.. and hopefully observe a lack of any Intel MKL FATAL ERROR.

@mingwandroid
Copy link

Reopening until this is fixed in the main channel. (this was a GitHub auto-close thing)

@mingwandroid mingwandroid reopened this Oct 13, 2017
@stevengj
Copy link

c3i_test2 works for me, thanks!

@mingwandroid
Copy link

mingwandroid commented Oct 26, 2017

Packages have been released to main now (part of defaults) and included in the Anaconda 5.0.1 installers, closing. Thanks for the report.

@xrisk
Copy link

xrisk commented Apr 13, 2021

Issue recurs for me.

Specifically, I’m using Julia 1.6.0 and the version of Conda I have is 4.1.0.

julia  -e 'using Pkg; Pkg.add("Conda"); Pkg.add("PyCall"); using PyCall; pyimport("numpy.linalg")["inv"](rand(100,100))'
    Updating registry at `~/.julia/registries/General`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
INTEL MKL ERROR: dlopen(/Users/xrisk/.julia/conda/3/lib/libmkl_intel_thread.dylib, 9): Library not loaded: @rpath/libiomp5.dylib
  Referenced from: /Users/xrisk/.julia/conda/3/lib/libmkl_intel_thread.dylib
  Reason: image not found.
Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.

Is this a regression?

@bkalpert
Copy link

I'm now seeing this in Julia 1.6.1, as well, after making a change to get Python package gdstk working. I then deleted trees ~/.julia/conda, ~/.julia/packages/Conda, ~/.julia/packages/PyCall, and ~/.julia/packages/PyPlot. In a new Julia session, did ENV["PYTHON"]="", Pkg.add PyCall, Pkg.add PyPlot. using PyPlot seemed to work fine (successfully plot), but then in another Julia session using PyPlot gives the abortive Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants