Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustify/fix TestFirewall.testNetworkingPage, add firewall debug messages #19344

Merged
merged 4 commits into from
Sep 20, 2023

Conversation

martinpitt
Copy link
Member

Trying to understand this failure, I can't get it locally.

@martinpitt martinpitt added no-test For doc/workflow changes, or experiments which don't need a full CI run, flake unstable test labels Sep 18, 2023
@martinpitt martinpitt force-pushed the debug-firewall branch 2 times, most recently from f729621 to 1516ed7 Compare September 18, 2023 13:06
@martinpitt
Copy link
Member Author

martinpitt commented Sep 18, 2023

Interesting, first run after the wait(m.execute()) succeeded. Retrying. Could have just been "it didn't wait for long enough"

Round 2 also all passed, except for this different failure

Round 3 had several failures, this proves that firewalld itself is correct, and it's somewhere between its D-Bus interface and Cockpit.

@martinpitt
Copy link
Member Author

martinpitt commented Sep 19, 2023

With the debouncing (top commit 6d19681), the first round was all green, retrying. Second round all green, except for one unrelated flake. Same in the third round, ubuntu-stable and fedora-38 were all green, and ubuntu-2204 also only has an unrelated failure.

@martinpitt martinpitt removed the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Sep 19, 2023
After turning the firewall back on, explicitly wait for the CLI to show
the expected number of zones. This will make it clear whether it's
firewalld's or cockpit's fault when the displayed number of active zones
is wrong.
initFirewalldDbus() gets called on every unit change, clean up the old
one properly.
firewalld_service's `changed` handler is called in bursts, for all the
internal systemd property changes like unit start times. These are not
interesting for firewall-client, and just lead to lots of duplicated
handler calls.

Remember the previous unit state, and short-circuit the handler
immediately if there is no interesting change.
@martinpitt martinpitt changed the title test: Amplify/debug TestFirewall.testNetworkingPage Robustify/fix TestFirewall.testNetworkingPage, add firewall debug messages Sep 19, 2023
@martinpitt martinpitt marked this pull request as ready for review September 19, 2023 13:39
@@ -204,20 +204,24 @@ function initFirewalldDbus() {

firewalld_service.addEventListener('changed', () => {
const installed = !!firewalld_service.exists;
const is_running = firewalld_service.state === 'running';

// we get lots of these events, for internal property changes; filter interesting changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should this filtering happen in pkg/lib/service.js maybe? But other code might rely on getting events also when service.unit changes...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe a change in behavior introduced by the python bridge? Maybe the C bridge would batch change notifications better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't filter them in service.js, as the firewalld_service object exposes all systemd properties, and it can't know which ones the client is interested in. It could be that the C bridge handled this differently and merged multiple D-Bus PropertyChanged signals into one handler call. I somehow doubt it, as that is a heuristic which is hard to do right for every use case, but I'll check this on a c8s build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a console.log to the handler and print the .state and .exists fields. On a centos-8-stream build it does not look different, there are also bursts of signal handlers:

-> wait: ph_is_present("#networking-firewall-summary input[type=checkbox]:checked")
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true

-> ph_mouse("#networking-firewall-summary input[type=checkbox]:not([disabled]):not([aria-disabled=true])","click",0,0,0,false,false,false,false)
> log: XXX firewalld_service.changed stopped true
> log: XXX firewalld_service.changed stopped true
> log: XXX firewalld_service.changed stopped true
> log: XXX firewalld_service.changed stopped true
> log: XXX firewalld_service.changed stopped true
> log: XXX firewalld_service.changed starting true
> log: XXX firewalld_service.changed starting true
> log: XXX firewalld_service.changed starting true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true
> log: XXX firewalld_service.changed running true

I'm not even sure if this flake is a regression with the Python bridge. It's not that common, we may have had it for a long time (just now I am putting a lot of attention to unstable tests).

Copy link
Member

@mvollmer mvollmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@martinpitt martinpitt merged commit 289a314 into cockpit-project:main Sep 20, 2023
93 checks passed
@martinpitt martinpitt deleted the debug-firewall branch September 20, 2023 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake unstable test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants