-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network checkpoints don't work reliably on RHEL 8.2 #13532
Comments
When it works, it looks like this:
|
I just (re)discovered the carefully tweaked timeouts in pkg/networkmanager/interfaces.js: |
This seems to work reasonably reliably with |
Similar to commit f1d06ab, but bump it harder. Right after starting the test and cockpit, the VMs are fairly loaded (particularly rhel-8-2), so that eth0 does not get shut down quickly enough and the checkpoint_destroy() succeeds. As a result, Cockpit just disconnects, the curtain never comes up, and there is never a restore. Fixes cockpit-project#13532
Similar to commit f1d06ab, but bump it harder. Right after starting the test and cockpit, the VMs are fairly loaded (particularly rhel-8-2), so that eth0 does not get shut down quickly enough and the checkpoint_destroy() succeeds. As a result, Cockpit just disconnects, the curtain never comes up, and there is never a restore. Fixes #13532 Closes #13534
Similar to commit f1d06ab, but bump it harder. Right after starting the test and cockpit, the VMs are fairly loaded (particularly rhel-8-2), so that eth0 does not get shut down quickly enough and the checkpoint_destroy() succeeds. As a result, Cockpit just disconnects, the curtain never comes up, and there is never a restore. Fixes cockpit-project#13532 Cherry-picked from master commit 7b115d1
Similar to commit f1d06ab, but bump it harder. Right after starting the test and cockpit, the VMs are fairly loaded (particularly rhel-8-2), so that eth0 does not get shut down quickly enough and the checkpoint_destroy() succeeds. As a result, Cockpit just disconnects, the curtain never comes up, and there is never a restore. Fixes #13532 Cherry-picked from master commit 7b115d1
testCheckpoint on RHEL 8.2 is very unreliable. It often needs 3 retries, and several more test runs to succeed.
This reproduces locally very reliably. When this happens, the web UI just shows "Deactivating.." for the interface and never shows the "Testing connection..." curtain. The interface in the VM immediately gets shut down. After a few minutes the UI gives up and shows the "Disconnected" curtain that the test logs also show.
Journal (from
virsh console
) during this:The checkpointing seems to work reliably when I sit() right before the test disables the interface, and I either do it manually (several times), or I just simply let some seconds pass. This suggests a race condition in either NM or the UI. I somewhat suspect NM as this doesn't happen on any other OS.
The text was updated successfully, but these errors were encountered: