Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

available_parallelism: Gracefully handle zero value cfs_period_us #104493

Merged
merged 1 commit into from
Dec 29, 2022

Conversation

adamncasey
Copy link
Contributor

@adamncasey adamncasey commented Nov 16, 2022

There seem to be some scenarios where the cgroup cpu quota field cpu.cfs_period_us can contain 0. This field is used to determine the "amount" of parallelism suggested by the function std::thread::available_parallelism

A zero value of this field cause a panic when available_parallelism() is invoked. This issue was detected by the call from binaries built by cargo test. I really don't feel like 0 is a good value for cpu.cfs_period_us, but I also don't think applications should panic if this value is seen.

This panic started happening with rust 1.64.0.

This case is gracefully handled by other projects which read this information: num_cpus, ninja, dotnet

Before this change, running cargo test in environments configured as described above would trigger this panic:

$ RUST_BACKTRACE=1 cargo test
    Finished test [unoptimized + debuginfo] target(s) in 3.55s
     Running unittests src/main.rs (target/debug/deps/x-9a42e145aca2934d)
thread 'main' panicked at 'attempt to divide by zero', library/std/src/sys/unix/thread.rs:546:70
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: std::sys::unix::thread::cgroups::quota
   4: std::sys::unix::thread::available_parallelism
   5: std::thread::available_parallelism
   6: test::helpers::concurrency::get_concurrency
   7: test::console::run_tests_console
   8: test::test_main
   9: test::test_main_static
  10: x::main
             at ./src/main.rs:1:1
  11: core::ops::function::FnOnce::call_once
             at /tmp/rust-1.64-1.64.0-1/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass '--bin x'

I've tested this change in an environment which has the bad (questionable?) setup and rebuilding the test executable against a fixed std library fixes the panic.

There seem to be some scenarios where `cpu.cfs_period_us` can contain `0`

This causes a panic when calling `std::thread::available_parallelism()` as is done so
from binaries built by `cargo test`, which was how the issue was
discovered. I don't feel like `0` is a good value for `cpu.cfs_period_us`, but I
also don't think applications should panic if this value is seen.

This case is handled by other projects which read this information:

 - num_cpus: https://github.com/seanmonstar/num_cpus/blob/e437b9d9083d717692e35d917de8674a7987dd06/src/linux.rs#L207-L210
 - ninja: https://github.com/ninja-build/ninja/pull/2174/files
 - dotnet: https://github.com/dotnet/runtime/blob/c4341d45acca3ea662cd8d71e7d71094450dd045/src/coreclr/pal/src/misc/cgroup.cpp#L481-L483

Before this change, this panic could be seen in environments setup as described
above:

```
$ RUST_BACKTRACE=1 cargo test
    Finished test [unoptimized + debuginfo] target(s) in 3.55s
     Running unittests src/main.rs (target/debug/deps/x-9a42e145aca2934d)
thread 'main' panicked at 'attempt to divide by zero', library/std/src/sys/unix/thread.rs:546:70
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::panicking::panic
   3: std::sys::unix::thread::cgroups::quota
   4: std::sys::unix::thread::available_parallelism
   5: std::thread::available_parallelism
   6: test::helpers::concurrency::get_concurrency
   7: test::console::run_tests_console
   8: test::test_main
   9: test::test_main_static
  10: x::main
             at ./src/main.rs:1:1
  11: core::ops::function::FnOnce::call_once
             at /tmp/rust-1.64-1.64.0-1/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: test failed, to rerun pass '--bin local-rabmq-amqpprox'
```

I've tested this change in an environment which has the bad setup and
rebuilding the test executable against a fixed std library fixes the
panic.
@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2022

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon.

Please see the contribution instructions for more information.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 16, 2022
@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@the8472
Copy link
Member

the8472 commented Nov 16, 2022

https://docs.kernel.org/scheduler/sched-bwc.html#management

The minimum quota allowed for the quota or period is 1ms.

So the value 0 shouldn't occur. Are you on some kind of cgroup emulation? Perhaps you should report that it can be 0 but shouldn't to whatever is causing that in the first place.

@samanpa
Copy link

samanpa commented Dec 16, 2022

https://docs.kernel.org/scheduler/sched-bwc.html#management

The minimum quota allowed for the quota or period is 1ms.

So the value 0 shouldn't occur. Are you on some kind of cgroup emulation? Perhaps you should report that it can be 0 but shouldn't to whatever is causing that in the first place.

It does occur in practice so it is worth guarding against this possibility. It makes cargo test which calls this function unusable on these machines.

@the8472
Copy link
Member

the8472 commented Dec 16, 2022

That doesn't answer the question what's causing it. Whether it's a kernel bug, some 3rd party software or whatever. It's good to have a documented rootcause.

@m-ou-se
Copy link
Member

m-ou-se commented Dec 28, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Dec 28, 2022

📌 Commit 04f1ead has been approved by m-ou-se

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 28, 2022
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#104402 (Move `ReentrantMutex` to `std::sync`)
 - rust-lang#104493 (available_parallelism: Gracefully handle zero value cfs_period_us)
 - rust-lang#105359 (Make sentinel value configurable in `library/std/src/sys_common/thread_local_key.rs`)
 - rust-lang#105497 (Clarify `catch_unwind` docs about panic hooks)
 - rust-lang#105570 (Properly calculate best failure in macro matching)
 - rust-lang#105702 (Format only modified files)
 - rust-lang#105998 (adjust message on non-unwinding panic)
 - rust-lang#106161 (Iterator::find: link to Iterator::position in docs for discoverability)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2dd2fb7 into rust-lang:master Dec 29, 2022
@rustbot rustbot added this to the 1.68.0 milestone Dec 29, 2022
@adamncasey adamncasey deleted the cgroupzeroperiod branch January 3, 2023 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants