-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/verify/check-networking-settings TestNetworking.testNoConnectionSettings broken of F32 #13821
Comments
ah right. can we do something like the following code?
|
@marusak: yes, that should work. For bonus points it could use a |
Locally, it either works, or it already fails much earlier, like here or here
Only just now I had another case of this; stack trace is
Here is a recent CI example for that, the original logs above were all cleaned up. |
I am trying with this patch: --- test/verify/check-networking-settings
+++ test/verify/check-networking-settings
@@ -73,9 +73,15 @@ class TestNetworking(NetworkCase):
m.execute('nmcli con del "%s"' % con_id)
b.wait_in_text("tr:contains('IPv4')", "Automatic")
- # Activate with ghost settings
+ # Activate with ghost settings; on busy machines this may run into the checkpointing
self.toggle_onoff(".panel-heading:contains('%s')" % iface)
- b.wait_in_text("tr:contains('Status')", "10.111.")
+ try:
+ b.wait_in_text("tr:contains('Status')", "10.111.")
+ except Error:
+ self.assertTrue(b.is_visible("#confirm-breaking-change-text"))
+ print("Got checkpoint dialog, acknowledging")
+ b.click("#confirm-breaking-change-popup .pf-m-danger")
+ b.wait_in_text("tr:contains('Status')", "10.111.")
# Check again that we now have connection settings
con_id = wait(lambda: self.iface_con_id(iface)) But it doesn't work -- after accepting the change in the dialog, the interface is still in the previous "manual" mode, and has IP 1.2.3.4:
In the journal I see that NM attempted DHCP, but the checkpoint rolled it back in one (!!) second:
I don't understand what's going on there -- checkpoint 1 gets created three times, rolled back 7s after initial creation and 1s after last creation, and DHCP gets cut off after 1s. Thus the whole transaction does not just include the last step 'activate device', but apparently the previous @mvollmer, can you make some sense of this? At this point it seems to me that it's not really the test that is broken, but the checkpoint logic? |
Just from reading the log: Chjeckpoint 1 is only created once, but there are two more attempts at creating checkpoints which both fail because Checkpoint 1 is still around. (This shouldn't happen ideally, but there is no code in Cockpit to somehow serialize checkpoint creation for a single interface.) Checkpoint 1 is created at time 1588257372.7236, rolled back at time 1588257379.7252 and attempted to be destroyed at time 1588257379.7753. This is the normal pattern for when a checkpoint has prevented a disconnection, and the "this will disconnect you" dialog should be open at this point. However, I don't think we expect a disconnect here (right?), and Cockpit should destroy the checkpoint before it is rolled back, right after "connection-add" has succeded at 1588257375.9983 So at first look:
I'll do some code reading and I'll try to reproduce this. |
Ok, this seems to be the very race that is explained in the comments:
settle_time is pretty long now since 7b115d1, 3 seconds, while rollback_time is kinda low, 7 seconds. The "connection-add" operation seems to be pretty slow: 1588257375.9983 - 1588257372.7236 = 3.274 and together with a settle_time of 3 we get pretty close to the rollback_time of 7. I propose to bring the settle_time down to something better suited for normal scenarios, and to switch off checkpointing entirely during CI tests (except for the tests that actually need it). |
A long settle_time time causes the checkpointing race to get pretty tight, and checkpoints would be unexpectedly rolled back in our integration tests. On the other hand, we had to increased the settle_time in 7b115d1 and f1d06ab to make the checkpointing tests themselves more robust. Thus, we reset the settle_time that is appropriate for manual use, switch off checkpointing entirely for most of the integration tests, and use a long settle_time when testing checkpoints themselves. Fixes #13821
A long settle_time time causes the checkpointing race to get pretty tight, and checkpoints would be unexpectedly rolled back in our integration tests. On the other hand, we had to increased the settle_time in 7b115d1 and f1d06ab to make the checkpointing tests themselves more robust. Thus, we reset the settle_time to something that is appropriate for manual use, switch off checkpointing entirely for most of the integration tests, and use a long settle_time when testing checkpoints themselves. Fixes cockpit-project#13821
A long settle_time time causes the checkpointing race to get pretty tight, and checkpoints would be unexpectedly rolled back in our integration tests. On the other hand, we had to increased the settle_time in 7b115d1 and f1d06ab to make the checkpointing tests themselves more robust. Thus, we reset the settle_time to something that is appropriate for manual use, switch off checkpointing entirely for most of the integration tests, and use a long settle_time when testing checkpoints themselves. Fixes #13821 Closes #14026
On F32 this test fails very often. (not broken though, sometimes also succeeds)
Screenshot always shows this 'Connection will be lost' message, which is obviously related.
@martinpitt maybe you have some idea what is exactly happening?
Logs:
1
2
3
The text was updated successfully, but these errors were encountered: