machine: Put SSHConnection.execute() with stdout under timeout #517

martinpitt · 2020-02-10T20:30:47Z

This caused tests hanging for half an hour on e. g. cleanup commands
like calling "journalctl" in cockpit's BaseCase.copy_journal().

Fixes #516

This caused tests hanging for half an hour on e. g. cleanup commands like calling "journalctl" in cockpit's `BaseCase.copy_journal()`. Fixes cockpit-project#516

martinpitt · 2020-02-10T20:31:45Z

I'm deliberately triggering loads of tests here to make sure that we don't have legitimate .execute() calls that take longer than 2 minutes. If we do, we need to fix them with an explicit timeout= first.

martinpitt · 2020-02-10T20:41:03Z

cockpit's rhel-8.2 branch also needs the less fix backported, I sent cockpit-project/cockpit#13529 for that.

martinpitt · 2020-02-10T21:38:27Z

Here testCheckpoint fails (NM bug?), and it now only takes 140s.

martinpitt · 2020-02-11T06:07:32Z

The subscription-manager failure is unrelated, due to a less 3.11 regression. Let's give upstream a day or two to fix it, otherwise I'll send a workaround PR. But I went through integration-tests/check-subscriptions and it doesn't use .execute(..., stdout=...), so this is unaffected by this change.

marusak · 2020-02-11T06:22:55Z

Here testCheckpoint fails (NM bug?), and it now only takes 140s.

I am actually bit concern about this. A bit afraid that this could bring a lot of flakes when it needs a bit more time. But it seems that when it gets broken, more time won't help, in which case it makes sense.

martinpitt · 2020-02-11T06:35:01Z

@marusak: The timeout can be changed, the default is 2 mins; but normally checkpoint/restore should take about 10s. The failures that we saw so far were due to checkpointing not working at all, not it being slow.

martinpitt · 2020-02-11T07:18:56Z

The subscription-manager failure

less 3.11.1 got released which fixes that. The test is green now.

However, I'm going to take a look at testCheckpoints on RHEL 8.2. This fails a little too often, and this at least deserves a bug report.

martinpitt · 2020-02-11T09:05:30Z

A bit afraid that this could bring a lot of flakes when it needs a bit more time.

To clarify: This change does not affect the actual test code. It does affect the post-test cleanup that calls journalctl to get the logs. That's the bit that uses Machine.execute() with the stdout= option. Hence I think this is pretty safe, as the journalctl call is the only thing in Cockpit's test suite that uses stdout=.

marusak

To clarify: This change does not affect the actual test code. It does affect the post-test cleanup that calls journalctl to get the logs. That's the bit that uses Machine.execute() with the stdout= option. Hence I think this is pretty safe, as the journalctl call is the only thing in Cockpit's test suite that uses stdout=.

But if it fails, the whole test fails (example). But seems that indeed it works just fine and is much faster, so have my ack :)

martinpitt · 2020-02-11T11:58:14Z

@marusak: Right, it would have failed before as well. It's another failure in the exception handling path of the original failure. :-) BTW, cockpit-project/cockpit#13534 is an attempt to actually fix the test on rhel-8-2.

machine: Put SSHConnection.execute() with stdout under timeout

f5b4289

This caused tests hanging for half an hour on e. g. cleanup commands like calling "journalctl" in cockpit's `BaseCase.copy_journal()`. Fixes cockpit-project#516

martinpitt mentioned this pull request Feb 10, 2020

testCheckpoint takes unreasonably long to fail #516

Closed

martinpitt requested a review from marusak February 11, 2020 06:08

marusak approved these changes Feb 11, 2020

View reviewed changes

martinpitt merged commit 8ad0049 into cockpit-project:master Feb 11, 2020

martinpitt deleted the ssh-exec-timeout branch February 11, 2020 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

machine: Put SSHConnection.execute() with stdout under timeout #517

machine: Put SSHConnection.execute() with stdout under timeout #517

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 11, 2020

marusak commented Feb 11, 2020

martinpitt commented Feb 11, 2020

martinpitt commented Feb 11, 2020

martinpitt commented Feb 11, 2020

marusak left a comment

martinpitt commented Feb 11, 2020

machine: Put SSHConnection.execute() with stdout under timeout #517

machine: Put SSHConnection.execute() with stdout under timeout #517

Conversation

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 10, 2020

martinpitt commented Feb 11, 2020

marusak commented Feb 11, 2020

martinpitt commented Feb 11, 2020

martinpitt commented Feb 11, 2020

martinpitt commented Feb 11, 2020

marusak left a comment

Choose a reason for hiding this comment

martinpitt commented Feb 11, 2020