NO-JIRA: deferred updates: fix in-place update handling on reboot #1162

ffromani · 2024-09-09T06:58:58Z

There are conditions on which we should not set the reload flag. This avoid regression in the test
"Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred"

Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2024-09-09T06:59:21Z

/cc @jmencak @MarSik

MarSik · 2024-09-09T07:16:16Z

pkg/tuned/controller.go

@@ -1029,19 +1031,18 @@ func (c *Controller) changeSyncerTuneD(change Change) (synced bool, err error) {
 			// Cache the value written to tunedRecommendFile.
 			c.daemon.recommendedProfile = change.recommendedProfile
 			klog.V(1).Infof("recommended TuneD profile updated from %q to %q [inplaceUpdate=%v nodeRestart=%v]", prevRecommended, change.recommendedProfile, inplaceUpdate, change.nodeRestart)
-			changeRecommend = true

 			if change.deferredMode == util.DeferUpdate && !inplaceUpdate && c.daemon.recoveredRecommendedProfile == change.recommendedProfile {
 				klog.V(1).Infof("recommended TuneD profile changed; skip TuneD reload [deferred=%v recoveredRecommended=%v]", change.deferredMode, c.daemon.recoveredRecommendedProfile)
 				// Reset because we need only once the first time we process the TuneD k8s object. Let's avoid stale data.
 				c.daemon.recoveredRecommendedProfile = ""
 			} else {
 				klog.V(1).Infof("recommended TuneD profile changed; trigger TuneD reload [deferred=%v]", change.deferredMode)


The log still says tuned should be reloaded. I will confuse people working on this in the future.. I think.

Well, the flow will ultimately trigger a tuned reload: see lines around 1065.
But still: how can we improve the logging?

Ah, right! So this is a fix that updates the fingerprint in addition to setting the reload flag. The reload part is not changing. Ok then.

we do want to not set the reload flag if very specific conditions are met, like the one captured in the test mentioned in the commit message, which is constantly failing without this fix

openshift-ci · 2024-09-09T07:47:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [MarSik,ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani · 2024-09-09T07:51:25Z

/hold

ffromani · 2024-09-09T10:02:50Z

/retest

add stricter check for the node condition when validating OCPBUGS-38795 Signed-off-by: Francesco Romani <fromani@redhat.com>

There are conditions on which we should not set the reload flag. This avoid regression in the test "Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred" Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani · 2024-09-10T12:58:17Z

/hold cancel

managed to deflake further (on my env) but not completely

jmencak

Thank you for the changes, Francesco. They look good to me. I've added some comments which are not exactly fair, because they should've been added them during previous reviews. Nevertheless, I believe that adding better logging/comments what happens (especially) in the

change.deferredMode == util.DeferUpdate && !inplaceUpdate && c.daemon.recoveredRecommendedProfile == change.recommendedProfile

section would help a lot future maintainers.

pkg/tuned/controller.go

ffromani · 2024-09-11T07:43:09Z

/hold

want to rerun the e2e testsuite

ffromani · 2024-09-11T11:45:50Z

/hold cancel
/retest

jmencak · 2024-09-12T06:38:12Z

Thank you for the changes/clarification, Francesco! I've found some typos and things that (IMO) need further clarification. I'll post a suggested diff later on and we can work from there.
/hold

jmencak · 2024-09-12T08:47:09Z

I tried to clarify the recently added comments and remove the typos. Hopefully I didn't introduce any false information/typos. Feel free to adjust as you see fit and happy to discuss here of over Slack:

diff --git a/pkg/tuned/controller.go b/pkg/tuned/controller.go
index e6095297..627f4271 100644
--- a/pkg/tuned/controller.go
+++ b/pkg/tuned/controller.go
@@ -1032,11 +1032,12 @@ func (c *Controller) changeSyncerTuneD(change Change) (synced bool, err error) {
 			c.daemon.recommendedProfile = change.recommendedProfile
 			klog.V(1).Infof("recommended TuneD profile updated from %q to %q [inplaceUpdate=%v nodeRestart=%v]", prevRecommended, change.recommendedProfile, inplaceUpdate, change.nodeRestart)
 
-			// if we get this far, it's either because we detected a node restart or the profile changed (not in-place, thus not-edit).
-			// So first: if this is a edit (replacing, not in-place), we do further logic. If it's a node restart, we just go ahead and process it.
-			// Handling then edits. Id we need to defer only edits, and we are re-processing a profile which we already applied, then we must
-			// not reload tuned, otherwise we're missing the point entirely about deferred updates.
-			// See the test "Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred"
+			// If we get this far, it's either because we detected a node restart or the recommended profile changed.
+			// If it's a node restart, we just process the change. If it's the latter -- TuneD profile switch, i.e. not an in-place update -- we need further
+			// logic to distinguish between the "update" and "always" DeferMode.
+			// For the "update" DeferMode, we only defer profile edits. If we are re-processing a profile which we already applied, then we must
+			// not reload tuned, otherwise we're missing the point of deferred updates.
+			// See the test "Profile deferred when applied should trigger changes when applied first, then deferred when edited, if tuned restart should be kept deferred"
 			// See the commit 3655f22656d4a3aa9f471099305dcd78a9c80320
 			if !inplaceUpdate && change.deferredMode == util.DeferUpdate && c.daemon.recoveredRecommendedProfile == change.recommendedProfile {
 				klog.V(1).Infof("reprocessing profile already in effect; this seems a daemon reload. Skip TuneD reload [deferred=%v recoveredRecommended=%v]", change.deferredMode, c.daemon.recoveredRecommendedProfile)

ffromani · 2024-09-12T09:06:15Z

LGTM, applying

document better the logic about processing edits Signed-off-by: Francesco Romani <fromani@redhat.com>

jmencak · 2024-09-12T09:12:41Z

/hold cancel

jmencak · 2024-09-12T12:11:04Z

/lgtm
the e2e-gcp-pao and e2e-hypershift-pao failures are unrelated
/retest

jmencak · 2024-09-12T13:54:03Z

Hmm, I think e2e-gcp-pao test needs looking into. I'm getting the same failures (irqbalance.go) in #1163 and also when running the test locally.

jmencak · 2024-09-16T11:57:22Z

Overriding 2 tests because of a known change in TuneD. The tests already passed with the previous version of TuneD and need to be adjusted.

/override ci/prow/e2e-gcp-pao
/override ci/prow/e2e-hypershift-pao

openshift-ci · 2024-09-16T11:58:02Z

@jmencak: Overrode contexts on behalf of jmencak: ci/prow/e2e-gcp-pao, ci/prow/e2e-hypershift-pao

In response to this:

Overriding 2 tests because of a known change in TuneD. The tests already passed with the previous version of TuneD and need to be adjusted.

/override ci/prow/e2e-gcp-pao
/override ci/prow/e2e-hypershift-pao

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jmencak · 2024-09-16T14:09:40Z

/jira refresh

openshift-ci-robot · 2024-09-16T14:09:43Z

@jmencak: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

yanirq · 2024-09-16T14:17:28Z

/retitle NO-JIRA: deferred updates: fix in-place update handling on reboot

to be added manually to #1149

openshift-ci-robot · 2024-09-16T14:17:58Z

@ffromani: This pull request explicitly references no jira issue.

In response to this:

There are conditions on which we should not set the reload flag. This avoid regression in the test
"Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-09-16T16:25:16Z

/retest-required

Remaining retests: 0 against base HEAD f3aafdb and 2 for PR HEAD c3bb0c4 in total

jmencak · 2024-09-16T16:59:45Z

/override ci/prow/e2e-gcp-pao
/override ci/prow/e2e-hypershift-pao

openshift-ci · 2024-09-16T17:00:03Z

@jmencak: Overrode contexts on behalf of jmencak: ci/prow/e2e-gcp-pao, ci/prow/e2e-hypershift-pao

In response to this:

/override ci/prow/e2e-gcp-pao
/override ci/prow/e2e-hypershift-pao

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2024-09-16T17:23:48Z

@ffromani: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2024-09-17T00:46:47Z

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator
This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202409162307.p0.g1612025.assembly.stream.el9.
All builds following this will include this PR.

…enshift#1162) * log: make sure to have high verbosity in tests Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: deferred: stricter testing add stricter check for the node condition when validating OCPBUGS-38795 Signed-off-by: Francesco Romani <fromani@redhat.com> * deferred: tuned: fix reload trigger when inplace update There are conditions on which we should not set the reload flag. This avoid regression in the test "Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred" Signed-off-by: Francesco Romani <fromani@redhat.com> * deferred: tuned: clarify comments and logs document better the logic about processing edits Signed-off-by: Francesco Romani <fromani@redhat.com> --------- Signed-off-by: Francesco Romani <fromani@redhat.com>

) * OCPBUGS-28647: tuned: distinguish deferred updates (#1129) * OCPBUGS-28647: tuned: distinguish deferred updates To fully support the usecase described in OCPBUGS-28647 and fix the issue, we need to further distinguish between first-time profile change and in-place profile change. This is required to better support a GitOps flow. The key distinction is if the recommended profile changes or not, and there's a desire to defer application of changes only when a profile is updated (e.g. sysctl modified), not the first time it is applied. Thus: - first-time profile change is a change which triggers a change of the recommended profile. - in-place profile update is a change which does NOT cause a switch to a TuneD profile with a different name. This involves changes to only the contents of the currently used profile. We change the way the annotation is used. We now require a value, which can be either - always: every Tuned object annotated this way will have its application deferred. - update: every Tuned object annotated this way will be processed as usual (and as it wasn't annotated) if it's a first-time profile change, but its in-place updates will be deferred. - a new internal value "never" is also added to be used internally to mean the deferred feature is disabled entirely. User can use this value but it will explicitly disable the feature (which is disabled already by default), thus is redundant and not recommended. Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: drop tags, use labels now that we have the more powerful ginkgo labels, we can stop using tags for newer tests. Signed-off-by: Francesco Romani <fromani@redhat.com> --------- Signed-off-by: Francesco Romani <fromani@redhat.com> * OCPBUGS-38795: Fix defer status during recommended profile change (#1142) * OCPBUGS-38795: Fix defer status during recommended profile change Process recommended profile change even when the profile itself has not changed itself. The internal profile fingerprint tracking was not updated during recommended profile update with no internal changes. * Add e2e test for defer=update * NO-JIRA: deferred updates: fix in-place update handling on reboot (#1162) * log: make sure to have high verbosity in tests Signed-off-by: Francesco Romani <fromani@redhat.com> * e2e: deferred: stricter testing add stricter check for the node condition when validating OCPBUGS-38795 Signed-off-by: Francesco Romani <fromani@redhat.com> * deferred: tuned: fix reload trigger when inplace update There are conditions on which we should not set the reload flag. This avoid regression in the test "Profile deferred when applied should trigger changes when applied fist, then deferred when edited, if tuned restart should be kept deferred" Signed-off-by: Francesco Romani <fromani@redhat.com> * deferred: tuned: clarify comments and logs document better the logic about processing edits Signed-off-by: Francesco Romani <fromani@redhat.com> --------- Signed-off-by: Francesco Romani <fromani@redhat.com> * Omit unbackported logging support --------- Signed-off-by: Francesco Romani <fromani@redhat.com> Co-authored-by: Francesco Romani <fromani@redhat.com> Co-authored-by: Martin Sivák <msivak@redhat.com>

log: make sure to have high verbosity in tests

fcf53ae

Signed-off-by: Francesco Romani <fromani@redhat.com>

openshift-ci bot requested review from jmencak, MarSik, dagrayvid and yanirq September 9, 2024 06:59

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2024

MarSik reviewed Sep 9, 2024

View reviewed changes

MarSik approved these changes Sep 9, 2024

View reviewed changes

openshift-ci bot assigned MarSik Sep 9, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2024

ffromani force-pushed the deferred-updates-cleanup-20240909 branch from f9d84bc to f429cda Compare September 9, 2024 11:35

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2024

ffromani added 2 commits September 9, 2024 15:08

e2e: deferred: stricter testing

aa9b60a

add stricter check for the node condition when validating OCPBUGS-38795 Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the deferred-updates-cleanup-20240909 branch from f429cda to c506826 Compare September 9, 2024 13:08

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2024

jmencak reviewed Sep 10, 2024

View reviewed changes

pkg/tuned/controller.go Outdated Show resolved Hide resolved

pkg/tuned/controller.go Outdated Show resolved Hide resolved

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2024

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2024

deferred: tuned: clarify comments and logs

c3bb0c4

document better the logic about processing edits Signed-off-by: Francesco Romani <fromani@redhat.com>

ffromani force-pushed the deferred-updates-cleanup-20240909 branch from b9947d5 to c3bb0c4 Compare September 12, 2024 09:06

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2024

openshift-ci bot assigned jmencak Sep 12, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 12, 2024

openshift-ci bot changed the title ~~deferred updates: fix in-place update handling on reboot~~ NO-JIRA: deferred updates: fix in-place update handling on reboot Sep 16, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 16, 2024

openshift-merge-bot bot merged commit 1612025 into openshift:master Sep 16, 2024
16 checks passed

yanirq mentioned this pull request Sep 17, 2024

[release-4.17] OCPBUGS-38721: tuned: distinguish deferred updates #1149

Merged

ffromani deleted the deferred-updates-cleanup-20240909 branch September 18, 2024 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NO-JIRA: deferred updates: fix in-place update handling on reboot #1162

NO-JIRA: deferred updates: fix in-place update handling on reboot #1162

ffromani commented Sep 9, 2024

ffromani commented Sep 9, 2024

MarSik Sep 9, 2024

ffromani Sep 9, 2024

MarSik Sep 9, 2024

ffromani Sep 9, 2024

openshift-ci bot commented Sep 9, 2024

ffromani commented Sep 9, 2024

ffromani commented Sep 9, 2024

ffromani commented Sep 10, 2024

jmencak left a comment

ffromani commented Sep 11, 2024

ffromani commented Sep 11, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

ffromani commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

jmencak commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

yanirq commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

jmencak commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

openshift-bot commented Sep 17, 2024

NO-JIRA: deferred updates: fix in-place update handling on reboot #1162

NO-JIRA: deferred updates: fix in-place update handling on reboot #1162

Conversation

ffromani commented Sep 9, 2024

ffromani commented Sep 9, 2024

MarSik Sep 9, 2024

Choose a reason for hiding this comment

ffromani Sep 9, 2024

Choose a reason for hiding this comment

MarSik Sep 9, 2024

Choose a reason for hiding this comment

ffromani Sep 9, 2024

Choose a reason for hiding this comment

openshift-ci bot commented Sep 9, 2024

ffromani commented Sep 9, 2024

ffromani commented Sep 9, 2024

ffromani commented Sep 10, 2024

jmencak left a comment

Choose a reason for hiding this comment

ffromani commented Sep 11, 2024

ffromani commented Sep 11, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

ffromani commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 12, 2024

jmencak commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

jmencak commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

yanirq commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

openshift-ci-robot commented Sep 16, 2024

jmencak commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

openshift-bot commented Sep 17, 2024