Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve autodetection of number of (available CPUs) #1103

Open
tovrstra opened this issue Jun 21, 2024 · 3 comments
Open

Improve autodetection of number of (available CPUs) #1103

tovrstra opened this issue Jun 21, 2024 · 3 comments

Comments

@tovrstra
Copy link

The current implementation of -n auto is not working as hoped on high-performance computing environments where processes are assigned a number of cores they can use. For example, when running a PyTest job with 9 cores requested in the job submission on a compute node with 36 cores, pytest-xdist runs 36 processes. Ideally, it should use 9.

This is related to the logic in def pytest_xdist_auto_num_workers(...) in src/xdist/plugin.py. This function first tries the psutil package, and only then falls back to os.sched_getaffinity, which gives the correct number. If one has accidentally psutil installed (difficult to avoid), the autodetection does not produce the best answer.

For your information, in the scenario sketched above, these are the results of various functions to get the number of CPU cores:

>>> len(os.sched_getaffinity(0))
9
>>> len(psutil.Process().cpu_affinity())
9
>>> psutil.cpu_count(logical=True)
36
>>> psutil.cpu_count(logical=False)
36
>>> os.cpu_count()
36
>>> multiprocessing.cpu_count()
36

The function os.sched_getaffinity was introduced in Python 3.3, older than the oldest supported version by pytest-xdist. As far as I understand, this function is not available in all environments. (Unclear to me, I cannot test on other OSes.) According to documentation, len(psutil.Process().cpu_affinity()) should at least work on Linux and Windows. There may still be a need to fall back to other functions. Trying them in the order listed above seems reasonable.

This suggestion may interfere with the option config.option.numprocesses. In compute environments, the option is not so relevant because the number of cores is managed by the queueing system. (Also, hyperthreading is often disabled in such scenarios because it degrades raw compute performance. It mainly helps for io-bound workloads.)

@RonnyPfannschmidt
Copy link
Member

This needs some investigation

For io bound suite's it's less of a problem, for CPU bound ones there's a clear win

Maybe there's need for a second auto flag that tries that number

@guywilsonjr
Copy link

I was initially skeptical, but after reading the psutil docs you're absolutely right! Nice!

For reference:

psutil.cpu_count(logical=True)
...

Note that psutil.cpu_count() may not necessarily be equivalent to the actual number of CPUs the current process can use. That can vary in case process CPU affinity has been changed, Linux cgroups are being used or (in case of Windows) on systems using processor groups or having more than 64 CPUs. The number of usable CPUs can be obtained with:

>>> len(psutil.Process().cpu_affinity())

Taking all this in
It would make sense for the "auto" priority to be:

  1. len(psutil.Process().cpu_affinity())
  2. psutil.cpu_count(logical=False)
  3. psutil.cpu_count(logical=True)
  4. len(os.sched_getaffinity(0))
  5. os.cpu_count()

@tovrstra
Copy link
Author

Sounds good to me. This would certainly make testing on HPC environments more mindless.

In case of config.option.numprocesses == "logical", I assume point 3 is tried first. If it fails, one can fall back to the auto way of doing things, or not?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants