Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(205/go-app): use automaxprocs to autoscale to CPU limit of container #278

Merged
merged 1 commit into from
Sep 18, 2024

Commits on Sep 18, 2024

  1. feat(205/go-app): use automaxprocs

    I have detected an issue when the GOMAXPROCS value doesn't match the number of available CPU cores.
    
    GOMAXPROCS is the number of threads allowed to run Go code at the same time, so by default the Go runtime sets it to the available number of CPU cores.
    
    If GOMAXPROCS is less than the number of CPU cores, you lose parallelism, as Go will not use the available cores, but setting it too high is also an issue, since the Go runtime can swap goroutines much faster than the OS can swap threads, including across cores.
    
    I suspect this could happen during tests in the cloud because we're setting a CPU limit of 2 cores, but machines like m6a.2xlarge have 8 vCPU, and I'm not sure if Kubernetes limit affects the number of cores the Go runtime sees when it starts up. Running with GODEBUG=gctrace=1 set will print out GC stats, the final field being the current number of PROCs.
    
    Solving it is simple, Uber made a library that looks at the container's CPU limit and set's GOMAXPROCS accordingly. If you manually set the environment variable the library will still respect it, leaving the option to manually change it if that is desired.
    
    Notably, it seems like having GOMAXPROCS be 2 when there are only 2 CPU causes a bigger performance win than letting the GC use 3x as much memory in my heavy-load wrk tests.
    
    Using wrk -d30s -t12 -c200 and limiting the server to 2 CPUs with
    `docker run --cpus 2`:
    
    ```
    Thread Stats   Avg      Stdev     Max   +/- Stdev
    
      CORES=2 GOMAXPROCS=8 GOGC=100
        Latency    20.72ms   25.56ms 105.36ms   79.27%
        Req/Sec     4.13k   609.55    19.03k    82.90%
    
      CORES=2 GOMAXPROCS=8 GOGC=300
        Latency    19.58ms   24.58ms  97.19ms   79.73%
        Req/Sec     4.51k   617.96    17.13k    79.74%
    
      CORES=2 GOMAXPROCS=2 GOGC=100
        Latency     6.53ms    2.37ms  24.92ms   70.40%
        Req/Sec     5.08k   690.78    43.98k    99.53%
    
      CORES=2 GOMAXPROCS=2 GOGC=300
        Latency     6.14ms    2.05ms  19.86ms   69.17%
        Req/Sec     5.40k   817.94    50.31k    99.89%
    
    ```
    
    The better scheduling / contention from matching the core count cuts the
    latency by 66% compared to improving GC performance by giving it more
    RAM.
    cookieo9 committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    0969f78 View commit details
    Browse the repository at this point in the history