Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bus error with multithreading linear algebra #298

Closed
aditya-sengupta opened this issue Apr 25, 2023 · 19 comments
Closed

Bus error with multithreading linear algebra #298

aditya-sengupta opened this issue Apr 25, 2023 · 19 comments
Labels
bug Something isn't working

Comments

@aditya-sengupta
Copy link

aditya-sengupta commented Apr 25, 2023

Affects: JuliaCall

Describe the bug
I'm running into a bus error in trying to port and speed up a linear algebra routine from Python to Julia.

In an environment with numpy and juliacall, the following code produces the error:

# jl_seg.py
import numpy as np
from juliacall import Main as jl

np.random.seed(100)

jl.seval("using Base.Threads")

jl.seval("""
function tri_solve_vec_col_b(N, a, b, c, r, g, u)
    beta = b[1]
    u[1] = r[1] / beta

    @inbounds @fastmath for j in 2:N
        g[j] = c[j-1] / beta
        beta = b[j] - a[j] * g[j]
        u[j] = (r[j] - a[j] * u[j-1]) / beta
    end
    @inbounds @fastmath for k in N-1:-1:1
        u[k] -= g[k+1] * u[k+1]
    end
end
""")

jl.seval("coln(x,i) = view(x,:,i)")

tri_solve_vec_b = jl.seval("""
    function tri_solve_vec_b(a, b, c, r, g, u)
    N = size(a, 1)
    Threads.@threads for i in 1:N
        tri_solve_vec_col_b(N, coln(a,i), coln(b,i), coln(c,i), coln(r,i), coln(g,i), coln(u,i))
    end
end
""")

mats = [np.random.random((10,10)) for _ in range(6)]
tri_solve_vec_b(*mats)

When run with python jl_seg.py, this produces the error message
[1] 73474 bus error python jl_seg.py
The number preceding the "bus error" changes each time.

Your system
Please provide detailed information about your system:

  • MacOS Ventura 13.2.1 on a 2021 MacBook Pro with Apple M1 Pro chip
  • Julia 1.8.5, Python 3.11.2, JuliaCall 0.9.12
Package          Version
---------------- -------
juliacall        0.9.12
juliapkg         0.1.10
numpy            1.24.3
pip              23.1.1
semantic-version 2.10.0
setuptools       65.6.3
>>> juliapkg.status()
JuliaPkg Status
/Users/adityasengupta/projects/misc/venv/julia_env/pyjuliapkg/juliapkg.json (empty project)
Julia 1.8.5 @ /Users/adityasengupta/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/bin/julia

Additional context
I've figured out that this is a combination of using Threads.@threads, the M1 chip, and juliacall; the same code runs as expected if it's single-threaded, if I run it on my 2018 MacBook Air, or if I run it purely in the Julia REPL (with rand(10,10) in Julia replacing the np.random.random call).

@aditya-sengupta aditya-sengupta added the bug Something isn't working label Apr 25, 2023
@cjdoris
Copy link
Collaborator

cjdoris commented May 14, 2023

Calling multithreaded Julia code from Python is not well supported, but see these tips: https://cjdoris.github.io/PythonCall.jl/dev/faq/#Is-PythonCall/JuliaCall-thread-safe?

Some people have had success with putting PythonCall.GC.disable()/enable() around the threaded code, and some have not.

@brian-dellabetta
Copy link
Contributor

brian-dellabetta commented May 26, 2023

Hi @cjdoris , wondering if you might have any further insights on why it works for some but not for others. Multi-threading in Julia code called by Python is a key feature for our use case.

We are hitting this same bus error, both on Intel-based and M1 Macbooks. FWIW running it in an ubuntu docker container fails as well, though with a Segfault error rather than bus error. I have tried with Python 3.8, 3.10, 3.11 and Julia 1.9 with PythonCall/juliacall 0.9.12 and 0.9.13. I slightly modified your example here for the new PythonCall.GC.enable/disable api:

from juliacall import Main as jl

jl.seval(
    """
    function worker()
            for i in 1:10_000_000
                a = Float64[]
                push!(a, 0.42)
                i % 1000 == 0 && println(i)
            end
    end
"""
)
jl.seval(
    """
begin
PythonCall.GC.disable()
t = Threads.@spawn worker()
println("waiting")
wait(t)
PythonCall.GC.enable()
end
"""
)

It succeeds with 1 thread but I hit this error for any runs with multiple threads (using Threads.@spawn or Threads.@threads):

[1]    70348 bus error  JULIA_NUM_THREADS=2 /Users/<redacted>/python

Thanks for creating and maintaining such a handy package!

@cjdoris
Copy link
Collaborator

cjdoris commented Jun 2, 2023

I'm afraid I don't have time to investigate multithreading issues. Until it's more reliable in JuliaCall, you may be better off running Julia in a separate process instead.

@brian-dellabetta
Copy link
Contributor

brian-dellabetta commented Jun 9, 2023

@cjdoris thanks for the reply. Totally understandable, I will post our findings here in case anyone else hitting this issue comes across this thread. We are happy to test out any suggestions, but the issue is around memory allocation in threaded code.

  1. We hit this issue consistently across different versions of Python (3.8.16,3.10.11), Julia (1.8.5,1.9.1), on Mac Intel, Mac M1, and ubuntu machines.
  2. The same code works in PyJulia (see snippet below for MWE)
  3. Oddly, the bus error occurs at the same iteration every time, even if we allocate different arrays (e.g. Float32[]).
  4. I don't expect PythonCall.GC calls to have an effect here, as it is purely julia structs/arrays being heap-allocated.

So currently our workarounds are

  1. Migrate to PyJulia, which claims support for multi-threaded code within Julia but suffers from an awkward issue around dynamically linked libpython
  2. Pre-allocate all memory in the single thread before passing to multi-threaded code.

@aditya-sengupta your issue may be entirely different, but it may be worth trying PyJulia out to see if you still hit the bus error. It's pretty easy to try, see my MWE:

import os

WORKER_FN_STR = """function worker()
    a = []
    for i in 1:1000000
        # push!(a, Float32[0.42])
        push!(a, Int64[4])
        i % 1000 == 0 && println(i)
    end
end"""

WORKER_RUN_STR = """begin
    t=Base.Threads.@spawn worker()
    println("waiting")
    wait(t)
    println("succeeded")
end"""


def run_juliacall(n_threads: int = 2):
    # pip install juliacall
    os.environ["JULIA_NUM_THREADS"] = str(n_threads)
    from juliacall import Main as jl

    jl.seval(WORKER_FN_STR)
    jl.seval("PythonCall.GC.disable()")
    jl.seval(WORKER_RUN_STR)
    jl.seval("PythonCall.GC.enable()")


def run_pyjulia(n_threads: int = 2):
    # pip install pyjulia
    os.environ["JULIA_NUM_THREADS"] = str(n_threads)
    import julia

    _jl = julia.Julia(compiled_modules=False)
    from julia import Main

    Main.eval(WORKER_FN_STR)
    Main.eval(WORKER_RUN_STR)


if __name__ == "__main__":
    # juliacall will cause bus error
    run_juliacall()
    # pyjulia runs without issue
    # run_pyjulia()

@aditya-sengupta
Copy link
Author

JuliaCall causes a bus error for me too, but pyjulia causes this error:

  File "/Users/adityasengupta/projects/test_pyjulia/venv/lib/python3.11/site-packages/julia/core.py", line 519, in __init__
    self._call("const PyCall = Base.require({0})".format(PYCALL_PKGID))
  File "/Users/adityasengupta/projects/test_pyjulia/venv/lib/python3.11/site-packages/julia/core.py", line 555, in _call
    self.check_exception(src)
  File "/Users/adityasengupta/projects/test_pyjulia/venv/lib/python3.11/site-packages/julia/core.py", line 609, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'ArgumentError' occurred while calling julia code:
const PyCall = Base.require(Base.PkgId(Base.UUID("438e738f-606a-5dbb-bf0a-cddfbfd45ab0"), "PyCall"))

I'm sure this is easily resolved but I'm not familiar enough with pyjulia to say how.

@brian-dellabetta
Copy link
Contributor

@aditya-sengupta I think you just need to do ] add PyCall in your julia env first? Also, I have not tried in Python 3.11

@aditya-sengupta
Copy link
Author

aditya-sengupta commented Jun 9, 2023 via email

@brian-dellabetta
Copy link
Contributor

Still encountering the same error - which Python version are you using?

3.8 mostly, but 3.10 had similar results. i know a lot of low-level stuff changed in 3.11

@brian-dellabetta
Copy link
Contributor

brian-dellabetta commented Jun 13, 2023

⭐ Happy Update ⭐ : After several weeks of troubleshooting, I think we have a robust and minimally invasive solution that doesn't require switching away from juliacall/PythonCall towards something like PyJulia.

We ran the code snippet I pasted above through lldb, and it showed calls to garbage collect on multiple threads seemingly at the same time. there must be some weird collision. The solution though is just to wrap any multi-threaded code invocations in a garbage collection disable/enable wrapper like so:

Threads.@threads for i in 1:n_samples
    pairs[i] = do_something(data[i, :])
end

becomes

PythonCall.GC.disable()
Base.GC.enable(false)
Threads.@threads for i in 1:n_samples
    pairs[i] = do_something(data[i, :])
    if Threads.threadid() == 1
        Base.GC.gc(false)
    end
end
Base.GC.enable(true)
PythonCall.GC.enable()
  • The disable/enable of Base.GC is required. Just disabling PythonCall.GC doesn't do the trick
  • The call to Base.GC.gc(false) is optional, and only needed if a lot of memory is being heap-allocated in the threaded code. It seems fairly robust.
  • The call to PythonCall.GC.disable/enable is also optional, and appears it's only needed if PythonCall.PyArray objects are being used in the multi-threaded code

Disclaimer: This has only been tested on Julia 1.9.1+ and Python 3.8. Some threading logic has been updated in the julia runtime and released as part of Julia 1.9.1, see discussion here, that is likely allowing this to work. Attempts on earlier Julia versions were failing but we didn't have the exact same code.

@aditya-sengupta try upgrading julia to 1.9.1 and wrap your threaded code the GC disable/enable commands. see if that does the trick for you 🤞

@vchuravy
Copy link

Duplicate of #219

x-ref: JuliaLang/julia#50278

@brian-dellabetta
Copy link
Contributor

@vchuravy thanks for the post and for the investigation. Will keep an eye on #219 , we are unblocked with the disabling/enabling of GC but would be nice to not have to do that everywhere we have threaded code.

@nic-barbara
Copy link

@brian-dellabetta I've been trying to multi-thread some python routines from Julia using the PythonCall side rather than JuliaCall. Should your solution work for this too? At the moment, the following code periodically causes a segfault (I can run test() once or twice in the REPL before it segfaults).

using PythonCall
np = pyimport("numpy")

function test(n=10)
    Base.GC.enable(false)
    PythonCall.GC.disable()
    Threads.@threads for i in 1:n
        np.zeros((i,n))
        if Threads.threadid() == 1
            Base.GC.gc(false)
        end
    end
    PythonCall.GC.enable()
    Base.GC.enable(true)
end

I'm using Julia 1.9.1 and PythonCall v0.9.13 on Ubuntu 22.04. Thanks in advance for any help!

@brian-dellabetta
Copy link
Contributor

brian-dellabetta commented Jun 30, 2023

@nic-barbara I didn't try any direct calls of Python code inside the multi-threaded Julia code. My guess is this in general won't work, but you could also try the solution @vchuravy suggests, it seems much more robust and less hacky than my workaround. You can try it out on this branch by setting env var PYTHON_JULIACALL_HANDLE_SIGNALS=yes before importing juliacall. You'll still want the PythonCall.GC.disable/enable calls wrapping your threaded code

@vchuravy
Copy link

Using multi-threading to call back into Python is unlikely to "just" work, due to Python's GIL. I am not familiar with the details of PythonCall.jl and there might be other things that could go wrong here. In any case that should be an distinct issue.

@cjdoris
Copy link
Collaborator

cjdoris commented Jul 4, 2023

@vchuravy is correct, you can only call Python code from Julia thread 1 (because of the GIL). This is mentioned in the FAQ.

@nic-barbara
Copy link

That's a pain, but understandable. Thanks for the help!

@github-actions
Copy link
Contributor

This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues about to be auto-closed label Aug 20, 2023
@github-actions
Copy link
Contributor

This issue has been closed because it has been stale for 7 days. You can re-open it if it is still relevant.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
@cjdoris cjdoris removed the stale Issues about to be auto-closed label Sep 22, 2023
@cjdoris cjdoris reopened this Sep 22, 2023
@aditya-sengupta
Copy link
Author

Tried this again and it worked with the disable/enable-s! Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants