Develop/Document multi-level parallelism policy #644

rrnewton · 2015-07-21T09:59:47Z

I can see that stack test defaults to parallel builds. But that refers to parallelism between different test-suites, right? In the case of both builds and tests (with test-framework or tasty) there's the issue of "inner loop" parallelism within each build or test target. I suppose we can't parallelize the tests until we have some way for the build-tool to know it's a tasty/test-framework suite, not just exitcode-stdio-1.0, but that still leaves build-parallelism. Does Stack currently pass -j to GHC?

I haven't seen comprehensive benchmark numbers, but a small value like -j2 or -j4 for GHC offers some benefit. The current implementation is not scalable however (for example, I haven't seen anything good come of -j32).

Nested parallelism of course raises the issue of N * N jobs being spawned for N cores. As long as its quadratic oversubscription and not exponential, I think this is not that big a problem for CPU usage, but it very often can create problems with memory usage (or hitting ulimits / max # of processes).

There are several related cabal issues, but it's a little hard to tell the current status with them spanning a lot of time and various merged and unmerged pull requests:

build multiple packages in parallel haskell/cabal#440 -- mentions how Gentoo uses an estimate of current load to gate creation of new jobs
Parallelise cabal build over modules haskell/cabal#976
'install -j': allow limiting the number of parallel linker invocations haskell/cabal#1572 -- some talk of using OS semaphors to manage shared compute resources. A draft of the related code appears here, in @23Skidoo's unmerged work.
-j should build package components in parallel haskell/cabal#2623

The text was updated successfully, but these errors were encountered:

snoyberg · 2015-07-21T16:56:48Z

I'm seeing about three different things being raised here:

Asking GHC to build a single package in parallel
Telling a test suite to run its test cases in parallel
Running multiple test suites from a single package in parallel

stack does none of these right now. The only thing stack does in parallel is process individual packages in parallel. Does that answer your question?

As to what should stack be doing... I don't see a downside to passing -j to GHC when available. I'd rather avoid running test suites in parallel, but there's really no strong reason for that. I don't see how stack can have any impact on the insides of the test suite itself, since that's entirely up to the test framework.

borsboom · 2015-07-21T23:59:45Z

One tricky thing is deciding how many threads GHC should be running if stack is running multiple builds (i.e. if you have 8 CPUs and stack is running 8 builds, each GHC shouldn't be running 8 of its own threads). Simplest might be to only pass -j if stack is only building a single package.

rrnewton · 2015-07-22T14:25:45Z

Given that neither stack nor cabal nor GHC has a notion of global resource management on the machine, I guess in the medium term what I'd like is enough knobs to experiment with this.

For example, it would be great to be able to "turn it up" to max parallelism -- parallel packages, parallel targets within a package, parallel modules within ghc --make plus parallel tests. And then do some benchmarking to see what speedups and memory usage look like.

We can already vary things pretty easily by generating .cabal/.stack files that pass the right arguments through to GHC and to the test-suites. I guess the critical thing on which to get help from stack itself is the third bullet @snoyberg mentioned -- multiple test-suites (plus profiling/non-profiling/documentation) in parallel within one package, which corresponds to haskell/cabal#2623.

By the way, as far as I know, no one's properly analyzed GHC builds from a parallel algorithms perspective. I.e. we need to profile the dependence graph and do a work/span analysis to figure out what is limiting our scaling. (We're working on a research project where we may be hacking on fine-grain parallelism inside certain GHC phases, but it only makes sense if there's a coherent picture for parallelism at the coarser grain too.)

In lieu of a GHC server mode (which has been discussed on various issues), we can't directly implement an inter-process work-stealing type policy that mimics the now-standard intra-process load balancing approach. But we can do what that Gentoo builder seems to be doing and simply delay some tasks to manage resource use. The more look-ahead or previous knowledge we have about the task DAG the smarter we can be in prioritizing the critical path.

snoyberg · 2015-07-31T05:36:29Z

I'm tempted to close this issue, and just add a link to my comment above to the FAQ. Any objection?

rrnewton · 2015-07-31T11:08:07Z

That's fine. Fixing this so that stack does something smart would be a big project not solved in a day, and that addresses the "document" part.

snoyberg · 2015-07-31T13:37:53Z

Added here: https://github.com/commercialhaskell/stack/wiki/FAQ#how-does-stack-handle-parallel-builds-what-exactly-does-it-run-in-parallel

alexanderkjeldaas · 2016-06-07T22:18:56Z

It would be great if this issue, since it's linked from the FAQ, tells me how to build at least my package in parallel using some ghc option, possibly specified in the cabal file or as an option to stack.

Alternatively, explicitly say that this cannot be done.

mgsloan · 2016-06-07T22:27:57Z

@alexanderkjeldaas No such flag necessary, stack will build your package in parallel with other things if it can.

Perhaps you mean having ghc build modules in parallel? Unfortunately in my experience this doesn't speed things up as much as I'd hoped. You can do --ghc-options -j5 (or whatever number).

alexanderkjeldaas · 2016-06-08T09:55:22Z

Yes that option is interesting - ghc seems to be able to use 6x the CPU and finish on exactly the sime time. Impressive!

sjakobi · 2016-06-08T22:50:49Z

Yes that option is interesting - ghc seems to be able to use 6x the CPU and finish on exactly the sime time.

Related GHC ticket: https://ghc.haskell.org/trac/ghc/ticket/9221

I did some simple timings on my machine (i3-2350M, 2 physical cores + hyperthreading) and always got the shortest build times with -j2 or -j3. The relative speedup varied a lot depending on the package, e.g. ~30% with vector-algorithms vs. ~10% with haskell-src-exts.

I was wondering how hard it would be to detect when stack doesn't use all of its build threads and in that case to pass -j2 or -j3 to the build jobs until all threads are used.
Build times would probably still be quite far from their optimum but I don't believe that this could result in build times that are worse than the status quo.

Blaisorblade · 2016-06-13T13:36:57Z

Related question (might need its own issue): how do I tell stack to set -j4 by default for itself, aside from ghc-options? I found nothing in http://docs.haskellstack.org/en/latest/yaml_configuration/#non-project-specific-config, http://docs.haskellstack.org/en/latest/faq/ or by googling.

Studying this issue suggests also builds are parallel, and stack's source suggests it defaults to -j $(getNumProcessors), already in 1.1.2, which sounds good (depending on answers to the other questions), and that this can be tuned through e.g. jobs: 4 in config.yaml.

mgsloan · 2016-06-13T13:40:31Z

stack build -j2. It's among the options listed in stack --help

Blaisorblade · 2016-06-13T13:42:47Z

@mgsloan I'm asking for docs on setting that by default, for all invocations.

alexanderkjeldaas · 2016-06-13T13:51:42Z

Also a different setting for the current project only would make sense.
Dependencies might be built in parallel, but the current project won't.

On Mon, Jun 13, 2016 at 3:42 PM, Paolo G. Giarrusso <
notifications@github.com> wrote:

@mgsloan https://github.com/mgsloan I'm asking for docs on setting that by
default, for all invocations.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#644 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAUtqb6aRw4-xKiiZobCsWQJiboF8Ml_ks5qLV5cgaJpZM4Fcn3C
.

runeksvendsen · 2016-08-09T07:00:26Z

Can anyone tell me how many parallel ghc processes stack will spawn? This is distinct from the ghc -j option, which specifies the level of parallelism within each ghc process, whereas I'm talking about how many of these ghc processes stack (or is it cabal?) keeps running at the same time for building dependencies in parallel.

I'm benchmarking some code on a 32-core machine, and stack seems to only spawn 8 concurrent ghc instances (building 8 dependencies in parallel), resulting in, at most, 25% use (8) of the 32 cores.

Based on my testing this figure should be equal to at least the number of available CPU cores, perhaps as much as five times that number, as ghc often seems to have a hard time using up just a single core fully (doing IO I presume). So if we set it to 5*NUM_CORES then each individual ghc process can use (on average) as little as 20% of one core, and we'd still be using all cores fully.

An actual use case for this would be automatically spawning the build process onto high-CPU VMs, so we can build stuff in 2 minutes rather than half an hour.

mgsloan · 2016-08-09T07:43:58Z

@runeksvendsen By default, it will build NUM_CORES packages concurrently.

An actual use case for this would be automatically spawning the build process onto high-CPU VMs, so we can build stuff in 2 minutes rather than half an hour.

Often we can't actually build that many packages concurrently, and so your CPUs remain unsaturated. You can pass -j2 and similar to ghc via --ghc-options -j2. Unfortunately, in my experience this hasn't helped build time as much as I'd hoped.

alexanderkjeldaas · 2016-10-29T16:58:50Z

@snoyberg the above link Added here: https://github.com/commercialhaskell/stack/wiki/FAQ#how-does-stack-handle-parallel-builds-what-exactly-does-it-run-in-parallel is dead, then the new FAQ links back to this issue, making it circular as this issue was closed because this is supposedly documented.

reopen issue?

alexanderkjeldaas · 2016-10-29T17:01:06Z

Also a separate issue that I'll mention here, a -j low-memory would be good to have for CI machines with limited RAM. It's not a problem to pre-fetch, and maybe configure in parallel, but not build.

Blaisorblade · 2016-10-29T17:18:24Z

Reopened as requested. After the issue was closed, it seems more than one question was asked and not addressed by the docs (sorry if I'm wrong).
Re -j low memory: how low? Your request makes sense, but I ask because right now, at least on machines with 1G RAM, GHC tends to segfault rather than report it failed allocation.

alexanderkjeldaas · 2016-10-29T17:23:43Z

I actually don't know what stack does by default, but I just tried to setup CI on buddy.works, and stack build will give me the following out-of-the-box:

thyme-0.3.5.5: copy/register 

  --  While building package JuicyPixels-3.2.8 using: 

        /home/app/.stack/setup-exe-cache/x86_64-linux/setup-Simple-Cabal-1.24.0.0-ghc-8.0.1 --builddir=.stack-work/dist/x86_64-linux/Cabal-1.24.0.0 build --ghc-options " -ddump-hi -ddump-to-file" 

      Process exited with code: ExitFailure (-9) (THIS MAY INDICATE OUT OF MEMORY) 

      Logs have been written to: /mysecretproject/.stack-work/logs/JuicyPixels-3.2.8.log 

      Configuring JuicyPixels-3.2.8... 

      Building JuicyPixels-3.2.8... 

      Preprocessing library JuicyPixels-3.2.8...

So some quick fix I could do to make CI work out-of-the-box is what I'd need.

alexanderkjeldaas · 2016-10-29T17:28:26Z

For practical purposes, something like MAKEFLAGS for stack would be nice to have when stack is called from within some other build system. In that case it would be easy to slap an STACKBUILDFLAGS=-j1 <somebuildthing> to see if it solves the problem, instead of having to retrofit injecting stack build options through that other tool.

Not a big issue, but if someone is going to look at this, might as well add it.

runeksvendsen · 2016-11-02T20:12:50Z

Re -j low memory: how low?

For what it's worth, I experienced this on a 600M RAM f1-micro Google Cloud instance:

runesvend@cloudstore-test:~/code/test-gcloud-datastore$ stack install
Downloaded nightly-2016-09-15 build plan.    
Updating package index Hackage (mirrored at https://github.com/commercialhaskell/all-cabal-hashes.gi

Fetched package index.    
Populated index cache.    
stack: out of memory (requested 1048576 bytes)
runesvend@cloudstore-test:~/code/gcloud-datastore$ free -h
             total       used       free     shared    buffers     cached
Mem:          594M       195M       398M       4.2M       2.8M       113M
-/+ buffers/cache:        80M       514M

Blaisorblade · 2016-11-02T22:21:20Z

For what it's worth, I don't recommend trying with less than 2G RAM. And enough swap enabled. Most failures otherwise appear due to a single GHC instance and stack can't do much about them; @runeksvendsen's log shows a failure due to stack, but GHC requires far more memory so I see little point in trying to fix it.
By default, stack uses --jobs to the number of processors (a reasonable default). By default stack does not tell GHC to build at once multiple modules of a package, unless stack is explicitly configured otherwise in ~/.stack/config.yaml via ghc-options.

@alexanderkjeldaas You might want to run stack build -j1 (which forces building at most one package at a time) and see if that helps (it might, your trace looks like stack is setting -j2).

runeksvendsen · 2016-11-03T06:33:37Z

I can confirm that I had to completely give up trying to build my project on a 600M RAM machine. It worked OK to begin with, and it built GHC fine, but the closer it got towards actually finishing, the quicker all RAM was consumed.

I found a 1.7G RAM machine to be sufficient, however. Although, the build process sometimes requires a restart due to the occasional out-of-memory error (which, as mentioned, can be avoided -- while capping concurrency/performance -- by using eg. -j1).

metaleap · 2017-02-11T17:33:16Z

By default, it will build NUM_CORES packages concurrently.

Quick note for the maintainers of the Windows build in case they're not already using it: NUMBER_OF_PROCESSORS will typically be set (certainly on "pro" editions / server editions / developer machines) in a like manner.

Anrock · 2018-05-21T15:25:48Z

@Blaisorblade

By default, stack uses --jobs to the number of processors (a reasonable default)

Is it actual for Windows? I believe i'm seeing a difference in build time when running stack build and stack build -j16 on my Win10 machine.

Blaisorblade · 2018-05-25T20:16:19Z

@Anrock That should be correct for Windows too; to debug, please describe your machine (maybe in a new issue?) — if you have hyperthreading, it's not obvious whether "number of processors" will actually count physical cores or logical threads, though it appears to count threads by default.

Just to double-check, please try calling by hand the underlying GHC API we use, GHC.Conc.getNumProcessors — example session below (my machine has 4 cores and 8 threads, but I have no Windows machine):

$ ghci
GHCi, version 8.4.2: http://www.haskell.org/ghc/  :? for help
Prelude> import GHC.Conc
Prelude GHC.Conc> getNumProcessors
8

Sources I consulted:

user documentation: https://docs.haskellstack.org/en/latest/yaml_configuration/#jobs.

the default appears taken from

stack/src/Stack/Config.hs

Lines 355 to 358 in 7e6e6a4

    
           configJobs <- 
        
              case getFirst configMonoidJobs of 
        
                  Nothing -> liftIO getNumProcessors 
        
                  Just i -> return i

(haven't confirmed), which calls in turn:

http://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Conc.html#v:getNumProcessors
http://hackage.haskell.org/package/base-4.11.1.0/docs/src/GHC.Conc.Sync.html#c_getNumberOfProcessors
https://github.com/ghc/ghc/blob/75361b119c609f0ab98f3d12a15690aae4ce42a1/rts/win32/OSThreads.c#L442
ultimately, https://msdn.microsoft.com/en-us/library/windows/desktop/dd405485(v=vs.85).aspx or https://msdn.microsoft.com/en-us/library/windows/desktop/ms724381(v=vs.85).aspx
referring to dwNumberOfProcessors in https://msdn.microsoft.com/en-us/library/windows/desktop/ms724958(v=vs.85).aspx, and giving further unclear pointers to other non-immediately-obvious docs.

Anrock · 2018-05-25T21:13:39Z

@Blaisorblade false alarm, it works as expected. I did some benchmarking and for builds with --jobs 8 and without it results are same.

As a note: i'm running Win10 Pro 1803 on AMD FX8350 with 4 physical cores and hyperthreading so 8 logical cores total. getNumProcessors returns 8.

snoyberg · 2019-03-25T10:15:47Z

There are no clear steps to be taken here, closing. If people would like to see doc improvements in the FAQ, please consider sending a PR.

ProofOfKeags · 2020-11-25T23:02:27Z

Is it possible to get stack to build a single package with module level parallelism? I find that building the Cabal library is often a bottleneck in the dependency graph of many projects and I'd like to be able to force it to run that one in parallel since it has 234 modules in it as of Nov'20. I saw some discussion up thread about not wanting to have every package have its own parallelism on the same level since it might cause a lot of cpu thrash.

Ideally the entire build system would share a single work queue rather than forcing packages into a particular "lane". As someone with a 24core dev machine this is something that is immensely useful to me, but I'm not sure how to go about thinking about this.

I suspect that the design of Cabal itself might be the limiting factor here but I do not know enough to be able to say one way or another. Is there any sort of workaround such that stack build [package] would only do package parallelism for dependencies but module parallelism for the top level target? If that was the case we could exert a bit more control over the whole process by having a series of commands to build "points of interest" in the graph to speed it up.

chadbrewbaker · 2021-05-04T06:30:30Z

Thumb twiddling on a multicore box. Would this option be hard to add, mostly for building cabal packages in parallel?

stack build --genmake; make -j

nikita-volkov · 2022-02-11T16:45:13Z

One tricky thing is deciding how many threads GHC should be running if stack is running multiple builds (i.e. if you have 8 CPUs and stack is running 8 builds, each GHC shouldn't be running 8 of its own threads). Simplest might be to only pass -j if stack is only building a single package.

This is solvable with a simple algorithm.

Stack knows how many parallel builds it's running at the end of executing each parallel build for a package. It can also know how many CPUs it has available, if it gets to control how many each build should be using. So if it's at the stage when it can only build one package and it has 4 CPUs available, it can start this particular build with -j4. If it has two packages to build, it can start building both with -j2. If it has five, it leaves one package in the queue and starts 4 builds without j.

rrnewton · 2022-02-14T11:50:31Z

Yes, that sounds like a reasonable heuristic. Note, however, that an adversarial schedule can screw over that strategy by a bunch of jobs finishing and freeing available CPUs right after you make the decision to do -j4. Kunal Agarwal @ Washington University studies this two level job scheduling problem, and there are some meaningful results in the area.

But in GHC's case the problem is also made a bit easier by the fact that GHC's internal parallelism is not very scalable. So telling GHC to use 32 cores wouldn't make sense...

hasufell · 2022-02-14T12:11:56Z

i.e. if you have 8 CPUs and stack is running 8 builds, each GHC shouldn't be running 8 of its own threads

This is already a wrong assumption from the C world, where jobs=cpus is a somewhat ok heuristic. In haskell, a single package can blow up 16GB worth of ram and 2 such packages built in parallel can bring your entire system down. This happened frequently at one of my previous companies with pandoc+amazonka. We had to run stack with -j1 in order to not trigger OOM or cause swapping to make the machine unresponsive for 15+ minutes.

mistmist · 2023-04-27T15:03:05Z

there's a relatively simple solution for the two-level problem: implement the GNU make job server protocol at all levels.

see for example cargo + rustc:

rust-lang/cargo#1744
rust-lang/rust#42682

rrnewton changed the title ~~Document multi-level parallelism policy~~ Develop/Document multi-level parallelism policy Jul 21, 2015

borsboom added this to the Later improvements milestone Jul 21, 2015

snoyberg self-assigned this Jul 31, 2015

snoyberg closed this as completed Jul 31, 2015

Blaisorblade reopened this Oct 29, 2016

snoyberg removed their assignment Jul 5, 2017

decentral1se added the component: docs (online) label Sep 12, 2017

snoyberg closed this as completed Mar 25, 2019

root mannequin mentioned this issue Jul 22, 2020

scripts/h0: use -j2 at GHC_OPTS Seagate/halon#1437

Closed

Develop/Document multi-level parallelism policy #644

Develop/Document multi-level parallelism policy #644

Comments

rrnewton commented Jul 21, 2015

snoyberg commented Jul 21, 2015

borsboom commented Jul 21, 2015

rrnewton commented Jul 22, 2015

snoyberg commented Jul 31, 2015

rrnewton commented Jul 31, 2015

snoyberg commented Jul 31, 2015

alexanderkjeldaas commented Jun 7, 2016

mgsloan commented Jun 7, 2016

alexanderkjeldaas commented Jun 8, 2016

sjakobi commented Jun 8, 2016

Blaisorblade commented Jun 13, 2016

mgsloan commented Jun 13, 2016

Blaisorblade commented Jun 13, 2016

alexanderkjeldaas commented Jun 13, 2016

runeksvendsen commented Aug 9, 2016

mgsloan commented Aug 9, 2016

alexanderkjeldaas commented Oct 29, 2016 • edited Loading

alexanderkjeldaas commented Oct 29, 2016

Blaisorblade commented Oct 29, 2016

alexanderkjeldaas commented Oct 29, 2016

alexanderkjeldaas commented Oct 29, 2016

runeksvendsen commented Nov 2, 2016

Blaisorblade commented Nov 2, 2016

runeksvendsen commented Nov 3, 2016

metaleap commented Feb 11, 2017

Anrock commented May 21, 2018

Blaisorblade commented May 25, 2018 • edited Loading

Anrock commented May 25, 2018

snoyberg commented Mar 25, 2019

ProofOfKeags commented Nov 25, 2020 • edited Loading

chadbrewbaker commented May 4, 2021

nikita-volkov commented Feb 11, 2022 • edited Loading

rrnewton commented Feb 14, 2022

hasufell commented Feb 14, 2022

mistmist commented Apr 27, 2023

alexanderkjeldaas commented Oct 29, 2016 •

edited

Loading

Blaisorblade commented May 25, 2018 •

edited

Loading

ProofOfKeags commented Nov 25, 2020 •

edited

Loading

nikita-volkov commented Feb 11, 2022 •

edited

Loading