treewide: drop -l$NIX_BUILD_CORES #192447

grahamc · 2022-09-22T15:39:52Z

Passing -l$NIX_BUILD_CORES improperly limits the overall system load.

For a build machine which is configured to run $B builds where each build gets total cores / B cores ($C), passing -l $C to make will improperly limit the load to $C instead of $B * $C.

This effect becomes quite pronounced on machines with 80 cores, with 40 simultaneous builds and a cores limit of 2. On a machine with this configuration, Nix will run 40 builds and make will limit the overall system load to approximately 2. A build machine with this many cores can happily run with a load approaching 80.

A non-solution is to oversubscribe the machine, by picking a larger $C. However, there is no way to divide the number of cores in a way which fairly subdivides the available cores when $B is greater than 1.

There has been exploration of passing a jobserver in to the sandbox, or sharing a jobserver between all the builds. This is one option, but relatively complicated and only supports make. Lots of other software uses its own implementation of -j and doesn't support either -l or the Make jobserver.

For the case of an interactive user machine, the user should limit overall system load using $B, $C, and optionally systemd's cpu/network/io limiting features.

Making this change should significantly improve the utilization of our build farm, and improve the throughput of Hydra.

Description of changes

Things done

lheckemann · 2022-09-22T15:56:22Z

(previously: #174473)

vcunat · 2022-09-22T17:00:27Z

I'm not a fan of this, but OK I hope. It does solve a real problem with how Hydra.nixos.org builders are set up.

It probably makes things worse for people who want(ed) to combine small and big builds on the same machine "fast", with many cores. (--max-jobs $(nproc) --cores $(nproc))

After merging this, we could try to rethink some other approach, possibly based on PR #184886 Also we might see more real-life feedback in the meantime.

grahamc · 2022-09-22T17:05:06Z

More robust designs for load limiting is probably a good thing to explore. Let's go ahead and merge this once ofborg is green, and if it causes problems we can revert -- no sweat.

jonringer

I think the proper path forward would be to introduce a NIX_MAX_CORES or some similar option which would default to $JOBS * $CORES, as there are many cases where you want run many jobs as most builds have long single threaded sections, but it can be really detrimental to have many jobs suddenly spike in thread + RAM usage.

Passing `-l$NIX_BUILD_CORES` improperly limits the overall system load. For a build machine which is configured to run `$B` builds where each build gets `total cores / B` cores (`$C`), passing `-l $C` to make will improperly limit the load to `$C` instead of `$B * $C`. This effect becomes quite pronounced on machines with 80 cores, with 40 simultaneous builds and a cores limit of 2. On a machine with this configuration, Nix will run 40 builds and make will limit the overall system load to approximately 2. A build machine with this many cores can happily run with a load approaching 80. A non-solution is to oversubscribe the machine, by picking a larger `$C`. However, there is no way to divide the number of cores in a way which fairly subdivides the available cores when `$B` is greater than 1. There has been exploration of passing a jobserver in to the sandbox, or sharing a jobserver between all the builds. This is one option, but relatively complicated and only supports make. Lots of other software uses its own implementation of `-j` and doesn't support either `-l` or the Make jobserver. For the case of an interactive user machine, the user should limit overall system load using `$B`, `$C`, and optionally systemd's cpu/network/io limiting features. Making this change should significantly improve the utilization of our build farm, and improve the throughput of Hydra.

grahamc · 2022-09-22T23:06:42Z

If this causes problems, let's consider a revert and see what happens :).

vcunat · 2022-09-24T17:53:42Z

This was pushed with the IMO wrong description about getting system load to value 2, but not worth changing history now.

Anyway, I materialized my ideas about an imperfect solution into PR #192799
(doing better than limiting by load seems hard)

jonringer · 2022-09-24T18:58:55Z

I created an issue NixOS/nix#7091 for getting a longer term solution in nix

vcunat · 2022-10-02T15:51:57Z

My suggestion isn't getting support so far, if I understand it correctly, but note that on NixOS the default nix configuration for the local machine is setting both to $(nproc) and thus the overall limit is quadratic, so the current combination on nixpkgs master seems rather bad for large machines.

Details: the default nix.conf will contain

cores = 0
max-jobs = auto

ElvishJerricco · 2022-11-16T22:34:16Z

Much of what I do with NixOS involves frequently performing fairly massive rebuilds on my workstation, and this change has made that substantially more unpleasant. I don't know what the solution is but I wanted to chime in to say that this is a significant problem for my personal workflow.

lheckemann · 2022-11-17T23:15:57Z

FWIW @edolstra has some work in progress on cgroup support in Nix. This may help with this kind of resource control problem. In the meantime, @ElvishJerricco maybe setting cores and max-jobs to the square root of the number of cores you have, or a little more, is a reasonable compromise?

ElvishJerricco · 2022-11-18T00:30:57Z

@lheckemann I do not think that would be a reasonable utilization of my hardware (EDIT: for reference, 16c/32t, so high enough that I can get a lot of parallelism out of it, though not nearly as high as many build servers). For instance, one thing I find myself doing often is rebuilding NixOS with a new variation of systemd. Such builds come with long periods where a single derivation could be using all my cores, and long periods where there are many derivations building at the same time. There is no single cores and max-jobs combination that will allow me to fully utilize my hardware for these builds without make -l $N.

Similarly, cgroups would not help, as I understand it. They can't stop make from spawning new processes, and each of those uses memory; enough that yesterday my 64GB desktop OOM'd and killed my builds multiple times (while my desktop was going completely unresponsive). Cgroups could be used to limit the memory usage of a build, but that will result in failures instead of builds simply choosing to use less memory by spawning fewer processes.

SuperSandro2000 · 2022-11-18T06:32:28Z

I think the current build system with cores and max-jobs is just not smart enough that you can always utilize most of your hardware. Right now you need to optimize for either cores or max-jobs.

ck3d · 2022-11-18T07:22:31Z

We could control make and ninja with a central jobserver as following working proof of concept shows:

stdenv: single make jobserver across multiple nix builds #143820

ElvishJerricco · 2022-11-18T09:21:23Z

@SuperSandro2000 All I can say is that the workloads I'm talking about with nixpkgs did utilize my hardware very well before this change, and now it is not possible to reach a balance. I do not know what the solution is; I just know it's now a lot worse for me.

trofi · 2022-11-18T09:54:27Z

I found the opposite: previous -l behaviour frequently prevented processes from spawning too aggressively because loadavg as a metric is laggy to report load on the system. One second I had 90 processes running, another i had 8. I have 16-CPU machine and am using --max-jobs 4 --cores 16 to get a reasonable utilisation. Ideally I want 16 parallel running processes, but I can leave with 64 worst case.

I find system end-to-end builds faster without -l.

ElvishJerricco · 2022-11-18T18:52:19Z

@trofi I've measured it and it is slightly faster without -l... if it doesn't crash. I get a lot of crashes now though, even with 64GB of RAM. And that's not to speak of how unresponsive the much higher load makes my desktop environment.

trofi · 2022-11-18T19:34:45Z

Yeah, that makes sense. My RAM/core ratio is 8GB/core (+6GB/core of zram just in case). I would guess 4GB/core should be fine for most packages provided /tmp is not taking RAM.

nixos-discourse · 2023-11-21T00:31:05Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nix-build-ate-my-ram/35752/8

emilazy · 2024-07-21T08:36:20Z

Linking #328677 for those subscribed to this.

grahamc requested review from FRidh, jonringer, Ericson2314 and matthewbauer as code owners September 22, 2022 15:39

github-actions bot added 6.topic: python 6.topic: stdenv Standard environment 6.topic: TeX Issues regarding texlive and TeX in general labels Sep 22, 2022

lheckemann approved these changes Sep 22, 2022

View reviewed changes

jonringer approved these changes Sep 22, 2022

View reviewed changes

ofborg bot added the ofborg-internal-error label Sep 22, 2022

ck3d approved these changes Sep 22, 2022

View reviewed changes

grahamc force-pushed the drop-l branch from 27bff0b to c2b898d Compare September 22, 2022 20:01

cole-h removed the ofborg-internal-error label Sep 22, 2022

markuskowa approved these changes Sep 22, 2022

View reviewed changes

ofborg bot added 10.rebuild-darwin-stdenv 10.rebuild-linux-stdenv 10.rebuild-darwin: 501+ 10.rebuild-darwin: 5001+ 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ labels Sep 22, 2022

trofi approved these changes Sep 22, 2022

View reviewed changes

grahamc merged commit 1379da1 into NixOS:staging Sep 22, 2022

grahamc deleted the drop-l branch September 22, 2022 23:06

vcunat mentioned this pull request Sep 24, 2022

stdenv: reintroduce limiting by system load #192799

Draft

This was referenced Nov 3, 2022

nix.settings.max-jobs defaults to auto which doesn't match nix's default #198668

Open

nixos/nix-daemon: default nix.settings.max-jobs to 1 #199491

Closed

OPNA2608 mentioned this pull request Nov 8, 2022

np2kai: Fix build #200256

Merged

13 tasks

Artturin mentioned this pull request Jan 19, 2023

make -l overridable #124166

Closed

centromere mentioned this pull request Jan 19, 2023

stdenv: Pass -lN to GNU Make based on NIX_LOAD_LIMIT #184886

Closed

13 tasks

NickCao mentioned this pull request Jan 25, 2023

ninja: limit CPU load to $NIX_BUILD_CORES #212577

Closed

13 tasks

RossComputerGuy mentioned this pull request Jan 25, 2023

flutter.engine: init #212328

Merged

13 tasks

aviallon mentioned this pull request Mar 24, 2023

libstore+nix-build: add load-limit setting and use its value for NIX_LOAD_LIMIT en var NixOS/nix#8105

Open

7 tasks

fricklerhandwerk mentioned this pull request Apr 28, 2023

libstore: Add load-limit setting to dynamically control parallelism NixOS/nix#6855

Open

This was referenced Jul 20, 2024

libstore: add load-limit setting to control parallelism NixOS/nix#11143

Open

{stdenv,ninja}: add support for NIX_LOAD_LIMIT #328677

Draft

emilazy mentioned this pull request Sep 14, 2024

[staging] perl: 5.38.2 -> 5.40.0 #333286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treewide: drop -l$NIX_BUILD_CORES #192447

treewide: drop -l$NIX_BUILD_CORES #192447

grahamc commented Sep 22, 2022

lheckemann commented Sep 22, 2022

vcunat commented Sep 22, 2022

grahamc commented Sep 22, 2022

jonringer left a comment

grahamc commented Sep 22, 2022

vcunat commented Sep 24, 2022

jonringer commented Sep 24, 2022

vcunat commented Oct 2, 2022 •

edited

Loading

ElvishJerricco commented Nov 16, 2022

lheckemann commented Nov 17, 2022

ElvishJerricco commented Nov 18, 2022 •

edited

Loading

SuperSandro2000 commented Nov 18, 2022

ck3d commented Nov 18, 2022

ElvishJerricco commented Nov 18, 2022

trofi commented Nov 18, 2022

ElvishJerricco commented Nov 18, 2022 •

edited

Loading

trofi commented Nov 18, 2022

nixos-discourse commented Nov 21, 2023

emilazy commented Jul 21, 2024

treewide: drop -l$NIX_BUILD_CORES #192447

treewide: drop -l$NIX_BUILD_CORES #192447

Conversation

grahamc commented Sep 22, 2022

Description of changes

Things done

lheckemann commented Sep 22, 2022

vcunat commented Sep 22, 2022

grahamc commented Sep 22, 2022

jonringer left a comment

Choose a reason for hiding this comment

grahamc commented Sep 22, 2022

vcunat commented Sep 24, 2022

jonringer commented Sep 24, 2022

vcunat commented Oct 2, 2022 • edited Loading

ElvishJerricco commented Nov 16, 2022

lheckemann commented Nov 17, 2022

ElvishJerricco commented Nov 18, 2022 • edited Loading

SuperSandro2000 commented Nov 18, 2022

ck3d commented Nov 18, 2022

ElvishJerricco commented Nov 18, 2022

trofi commented Nov 18, 2022

ElvishJerricco commented Nov 18, 2022 • edited Loading

trofi commented Nov 18, 2022

nixos-discourse commented Nov 21, 2023

emilazy commented Jul 21, 2024

vcunat commented Oct 2, 2022 •

edited

Loading

ElvishJerricco commented Nov 18, 2022 •

edited

Loading

ElvishJerricco commented Nov 18, 2022 •

edited

Loading