Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytest-xdist workers crashing #404

Closed
github-actions bot opened this issue Sep 29, 2022 · 6 comments · Fixed by #432
Closed

pytest-xdist workers crashing #404

github-actions bot opened this issue Sep 29, 2022 · 6 comments · Fixed by #432
Assignees

Comments

@github-actions
Copy link

Workflow Run URL

@jrbourbeau
Copy link
Member

tests/benchmarks/test_csv.py::test_csv_basic and tests/benchmarks/test_dataframe.py::test_dataframe_align caused pytest-xdist workers to crash for some reason. We've started seeing this with other tests on PRs as well. It's not clear why this is happening. It's not always these tests that cause pytest worker crashes -- other tests do as well.

[gw2] node down: Not properly terminated
[gw2] [  4%] FAILED tests/benchmarks/test_csv.py::test_csv_basic 

replacing crashed worker gw2
[gw3] node down: Not properly terminated
[gw3] [  4%] FAILED tests/benchmarks/test_dataframe.py::test_dataframe_align 

replacing crashed worker gw3

This was referenced Sep 30, 2022
@jrbourbeau
Copy link
Member

We saw something similar in #412 with tests/benchmarks/test_array.py::test_anom_mean

[gw0] node down: Not properly terminated
[gw0] [  4%] FAILED tests/benchmarks/test_array.py::test_anom_mean 

replacing crashed worker gw0

@jrbourbeau
Copy link
Member

We saw something similar in #413 with tests/benchmarks/test_zarr.py::test_select_scalar

[gw1] node down: Not properly terminated
[gw1] [  4%] FAILED tests/benchmarks/test_zarr.py::test_select_scalar 

replacing crashed worker gw1

@jrbourbeau jrbourbeau changed the title ⚠️ CI failed ⚠️ pytest-xdist workers crashing Oct 3, 2022
This was referenced Oct 3, 2022
This was referenced Oct 4, 2022
@ncclementi
Copy link
Contributor

ncclementi commented Oct 6, 2022

Looking at it quickly, It seems it is a known issue and is still open pytest-dev/pytest-xdist#466
someone else reported here too pytest-dev/pytest-xdist#714 and didn't get much attention either.

Not quite sure how to proceed here. There are other issues opened or closed with no answer related to this.

@ian-r-rose
Copy link
Contributor

It may just be that we have too many concurrent xdist workers. The theory was there should not be much work done on the client, so 8 workers is fine. But that theory might not be correct. In particular, I think that package_sync might be kind of expensive for the client. Some support for package_sync being expensive is that in #429 all of the worker crashes happen on the first test of the given module, which would be when the cluster is being spun up.

Two possible ways to alleviate this:

  1. Reduce the number of xdist workers (try six or four?)
  2. Revert Use single job for all test categories #370 and distribute the CI across more runners again.

Thoughts?

@ian-r-rose ian-r-rose self-assigned this Oct 6, 2022
@ncclementi
Copy link
Contributor

Reduce the number of xdist workers (try six or four?)

We can test this on a branch and run it every day for a few days and see if it fixes it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants