-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
machine: Put SSHConnection.execute() with stdout under timeout #517
machine: Put SSHConnection.execute() with stdout under timeout #517
Conversation
This caused tests hanging for half an hour on e. g. cleanup commands like calling "journalctl" in cockpit's `BaseCase.copy_journal()`. Fixes cockpit-project#516
I'm deliberately triggering loads of tests here to make sure that we don't have legitimate .execute() calls that take longer than 2 minutes. If we do, we need to fix them with an explicit timeout= first. |
cockpit's rhel-8.2 branch also needs the less fix backported, I sent cockpit-project/cockpit#13529 for that. |
Here testCheckpoint fails (NM bug?), and it now only takes 140s. |
The subscription-manager failure is unrelated, due to a less 3.11 regression. Let's give upstream a day or two to fix it, otherwise I'll send a workaround PR. But I went through integration-tests/check-subscriptions and it doesn't use |
I am actually bit concern about this. A bit afraid that this could bring a lot of flakes when it needs a bit more time. But it seems that when it gets broken, more time won't help, in which case it makes sense. |
@marusak: The timeout can be changed, the default is 2 mins; but normally checkpoint/restore should take about 10s. The failures that we saw so far were due to checkpointing not working at all, not it being slow. |
less 3.11.1 got released which fixes that. The test is green now. However, I'm going to take a look at testCheckpoints on RHEL 8.2. This fails a little too often, and this at least deserves a bug report. |
To clarify: This change does not affect the actual test code. It does affect the post-test cleanup that calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: This change does not affect the actual test code. It does affect the post-test cleanup that calls journalctl to get the logs. That's the bit that uses Machine.execute() with the stdout= option. Hence I think this is pretty safe, as the journalctl call is the only thing in Cockpit's test suite that uses stdout=.
But if it fails, the whole test fails (example). But seems that indeed it works just fine and is much faster, so have my ack :)
@marusak: Right, it would have failed before as well. It's another failure in the exception handling path of the original failure. :-) BTW, cockpit-project/cockpit#13534 is an attempt to actually fix the test on rhel-8-2. |
This caused tests hanging for half an hour on e. g. cleanup commands
like calling "journalctl" in cockpit's
BaseCase.copy_journal()
.Fixes #516