@everywhere is slow on HPC with multi-node environment #39291

algorithmx · 2021-01-17T07:49:47Z

Line 207 in 7647ab5

remotecall_eval(Main, procs, ex)

Please check here for descriptions of the problem by three Julia users:

https://discourse.julialang.org/t/everywhere-takes-a-very-long-time-when-using-a-cluster/35724

I have tested @everywhere and pmap() on an HPC. Test code and result available here
https://github.com/algorithmx/nodeba

Basically I just put timestamps between the lines. You can see in t*.log files that the largest gap is the one between timestamp 3 and 4. More interestingly, I found that increasing nworkers() causes the gap to increase linearly. I believe that this gap represents the execution time of the macro @everywhere, seen from master.

The vesion info is :

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c7 (2020-11-09 13:37 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD EPYC 7452 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
JULIA_PKG_SERVER = https://mirrors.tuna.tsinghua.edu.cn/julia

The text was updated successfully, but these errors were encountered:

algorithmx · 2021-01-18T01:07:38Z

related issue #28966
@KristofferC

arnauqb · 2021-02-22T16:01:48Z

Hey @algorithmx I'm facing exactly the same issue you describe, have you found any workaround yet?

Could it be realted to this? JuliaLang/Pkg.jl#1219

algorithmx · 2021-03-02T08:19:06Z

Hey @algorithmx I'm facing exactly the same issue you describe, have you found any workaround yet?

Could it be realted to this? JuliaLang/Pkg.jl#1219

not yet :-(

arnauqb · 2021-03-03T14:13:10Z

So I have been able to reduce the delay quite significantly by using the latest Julia 1.6 release (seems that the faster compilation speed helps), and also changing the Base.DEPOT_PATH.

using Distributed, ClusterManagers
pids = addprocs_slurm(...)
@everywhere pushfirst!(Base.DEPOT_PATH, "/tmp/julia.cache")

moble · 2021-08-09T11:37:20Z

I've also run into this problem (posted on discourse here), and traced it back to just using @everywhere with basically any simple statement — even

@everywhere 1+2

It so happens that most of us run @everywhere using <SomePackage> first, so it looks like it has to do with precompilation, but I don't think it does. If I literally run @everywhere 1+2 first, then do all my imports, the imports are nice and fast — but only after 1+2 finishes, which takes forever.

This is a real killer for my use case, which involves scaling up to thousands of processors, which will waste (and has wasted for me) thousands of cpu-hours just running that first @everywhere statement.

moble · 2021-08-10T14:30:29Z

Also note that @affans reported that this was a regression, with julia 1.0.5 running very quickly, and 1.3 very slowly.

KristofferC · 2021-08-10T14:39:34Z

If possible, it would be good to run a bisect between julia 1.0 and 1.3 to find out if there was a specific commit that caused the regression.

LarkAnspach · 2021-08-10T15:18:29Z

Yes

vancleve · 2021-09-07T21:44:32Z

I've also noticed this on Julia 1.6.2 and it's not just multi-node environments. When I am on a 128 core AMD machine and perform @everywhere using pkgs, I notice using top that quickly only a handful of julia processes are using any CPU at once (~10% or so) and only one of the processes is running. Which process is running changes until finally the @everywhere using completes. This happen on multi-node systems too except its one node at a time with a handful of processes at low CPU and one process running.

I have a video of what this looks like on a single node here:
https://youtu.be/mTar7HvIMQo

moble · 2021-09-07T21:55:38Z

Workaround described here. Basically, you have to precompile the code that lets processes talk to each other.

vancleve · 2021-09-08T19:59:15Z

thanks @moble! I guess what I wonder is why is seems like precompilation problem is worse with many more processors? In other words, why aren't they all just precompiling simultaneously (the video makes it look like they're doing it one by one almost)?

moble · 2021-09-09T14:50:53Z

I don't actually know how things work under the hood, but I have tested it and found that the timing increases linearly with the number of processes. So my mental model is that @everywhere involves the primary process sending the instruction to workers and waiting for some type of confirmation that the instruction was received — or at least that sending has started. (I don't know it's an actual receipt confirmation, or opening of the socket, or creation of that worker's log, or the beginning of deserialization...)

But the primary must do this in serial to some extent, meaning it doesn't start sending the instruction to the next worker until it has whatever confirmation it needs. Normally this wouldn't be a problem, because confirmation is presumably almost instantaneous the great majority of the time. But compiling the code required to confirm takes ~1 second. And that's the part that has to happen on each worker in serial. That 1 second is not used in compiling the statement itself (which I know because I've tried statements that take much longer to compile); it must be just some piece of code required to let the primary know it got the message.

By precompiling something as simple as @everywhere 1+1, each worker can skip that step of compiling the confirming function, so the primary can move on to the next worker more quickly. And that's exactly what KristofferC has added/re-enabled in #42156.

vancleve · 2021-09-18T22:54:49Z

maybe the precompilation in #42156 didn't fix this issue?

#42156 (comment)

maybe there is some other code that's needs to be precompiled on the worker end that isn't precompiled by just calling @everywhere 1+1 or the other lines in generate_precompile.jl?

moble · 2021-09-19T01:48:39Z

Maybe. I don't know how julia's own build process works; maybe the image you used isn't being built with multiple processes.

Also, I'll point out that in the workaround I linked above, I actually used the --trace-compile flag on both the primary julia process and the worker process, then combined the outputs in case the worker's output wasn't a subset of the primary's. (I didn't actually check whether it was or not.) I don't know whether or not julia does this when building itself.

carlocastoldi · 2021-12-09T12:09:17Z

Is it possible that his bug also affects MPI.jl?
I have built a framework that revolves around MPI calls to synchronize I/O operations over all HPC nodes. With few nodes (e.g. 4) it works like charm, but as soon I go up to 50/100 nodes it just becomes unbearable.

@everywhere calls doesn't seem to be the cause this time since I'm using @moble 's trick, but I'm now starting to think it's the MPI calls' fault.
I tried to add to the precompilation the MPI calls i'm using, but it doesn't seem to work.
I did it by adding :MPI in create_sysimage() call in precompile.jl. Then in precompile_everywhere.jl I wrote the calls i use:

# all precompile(...) calls
function main()
    MPI.Initialized()
    MPI.Init(threadlevel=:multiple)
    base_comm = MPI.COMM_WORLD
    print(" comm size: $(MPI.Comm_size(base_comm)) ---")
    base_grp = MPI.Comm_group(base_comm)
    id_group = MPI.Group_incl(base_grp, Int32[0])
    comm = MPI.Comm_create_group(base_comm, id_group, 42)
    MPI.Barrier(base_comm)
    fh = MPI.File.open(comm, "path/to/file/foo.bar"; append=true, write=true, create=true)
    MPI.File.seek_shared(fh, 0)
    MPI.File.write_ordered(fh, Int32[42,420])
    MPI.File.write_at_all(fh, 1, Int32[69])
    MPI.File.write_at(fh, 0, Int32[1])
    close(fh)
end

main()

And I then just execute julia precompile.jl precompile.
Do you have any idea? It seems like even doing MPI.Init() on 100 nodes takes ~1 hour....

giordano · 2021-12-09T12:45:33Z

Is it possible that his bug also affects MPI.jl?

Unlikely, since MPI.jl has nothing to do with the Distributed standard library. You may want to report the issue to the MPI.jl repository, but you have to provide more details about your system.

carlocastoldi · 2021-12-09T13:37:19Z

Unlikely, since MPI.jl has nothing to do with the Distributed standard library. You may want to report the issue to the MPI.jl repository, but you have to provide more details about your system.

Sure, thank you. I'm now trying investigating on it so that i have more informations about it

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 62e0729)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes JuliaLang/julia#44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see JuliaLang/julia#44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes JuliaLang/julia#39291. - JuliaLang/julia#42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with JuliaLang/julia#38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com> (cherry picked from commit 3b57a49)

@sync

* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes #44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see #44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes #39291. - #42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with #38931 Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>

KristofferC mentioned this issue Sep 8, 2021

reenable the precompile generation for Distributed #42156

Closed

henry2004y mentioned this issue Sep 29, 2021

Parallel postprocessing on Vorna henry2004y/Vlasiator.jl#27

Closed

ViralBShah added the domain:parallelism Parallel or distributed computation label Mar 13, 2022

This was referenced Mar 16, 2022

Unexplained slowness in @everywhere remotecalls with imports #44645

Closed

avoid using @sync_add on remotecalls #44671

Merged

KristofferC closed this as completed in #44671 Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@everywhere is slow on HPC with multi-node environment #39291

@everywhere is slow on HPC with multi-node environment #39291

algorithmx commented Jan 17, 2021 •

edited

Loading

algorithmx commented Jan 18, 2021 •

edited

Loading

arnauqb commented Feb 22, 2021 •

edited

Loading

algorithmx commented Mar 2, 2021

arnauqb commented Mar 3, 2021

moble commented Aug 9, 2021 •

edited

Loading

moble commented Aug 10, 2021

KristofferC commented Aug 10, 2021

LarkAnspach commented Aug 10, 2021

vancleve commented Sep 7, 2021 •

edited

Loading

moble commented Sep 7, 2021

vancleve commented Sep 8, 2021

moble commented Sep 9, 2021

vancleve commented Sep 18, 2021

moble commented Sep 19, 2021

carlocastoldi commented Dec 9, 2021

giordano commented Dec 9, 2021

carlocastoldi commented Dec 9, 2021

@everywhere is slow on HPC with multi-node environment #39291

@everywhere is slow on HPC with multi-node environment #39291

Comments

algorithmx commented Jan 17, 2021 • edited Loading

algorithmx commented Jan 18, 2021 • edited Loading

arnauqb commented Feb 22, 2021 • edited Loading

algorithmx commented Mar 2, 2021

arnauqb commented Mar 3, 2021

moble commented Aug 9, 2021 • edited Loading

moble commented Aug 10, 2021

KristofferC commented Aug 10, 2021

LarkAnspach commented Aug 10, 2021

vancleve commented Sep 7, 2021 • edited Loading

moble commented Sep 7, 2021

vancleve commented Sep 8, 2021

moble commented Sep 9, 2021

vancleve commented Sep 18, 2021

moble commented Sep 19, 2021

carlocastoldi commented Dec 9, 2021

giordano commented Dec 9, 2021

carlocastoldi commented Dec 9, 2021

algorithmx commented Jan 17, 2021 •

edited

Loading

algorithmx commented Jan 18, 2021 •

edited

Loading

arnauqb commented Feb 22, 2021 •

edited

Loading

moble commented Aug 9, 2021 •

edited

Loading

vancleve commented Sep 7, 2021 •

edited

Loading