Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-38795: Fix defer status during recommended profile change #1142

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

MarSik
Copy link
Contributor

@MarSik MarSik commented Aug 22, 2024

Process recommended profile change even when the profile itself has not changed itself. The internal profile fingerprint tracking was not updated during recommended profile update with no internal changes.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 22, 2024
@MarSik
Copy link
Contributor Author

MarSik commented Aug 22, 2024

After the fix (with few extra logging changes):

2024-08-22 08:09:02,700 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-node' applied
...
I0822 08:31:04.090721   43551 controller.go:1031] recommended TuneD profile updated from "openshift-node" to "openshift-profile" [inplaceUpdate=false nodeRestart=false]
I0822 08:31:04.090728   43551 controller.go:1039] recommended TuneD profile changed; trigger TuneD reload [deferred=update]
...
I0822 08:31:04.091068   43551 controller.go:478] profilesExtract(): fingerprint of extracted profiles: "46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f"
I0822 08:31:04.091108   43551 controller.go:1061] profile sync change:false fp:46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f err:<nil>
I0822 08:31:04.091141   43551 controller.go:1068] current unpacked profile fingerprint "57bf1d12635adc2a15c046890cf1d675630b2148fae2cf0231eea83c98fe66b6" -> "46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f"
I0822 08:31:04.091152   43551 controller.go:1142] recommended profile change with deferredMode=update: setting reload flag
I0822 08:31:04.091157   43551 controller.go:1175] changeSyncerRestartOrReloadTuneD()
I0822 08:31:04.091160   43551 controller.go:843] tunedReload()
...
2024-08-22 08:31:04,220 INFO     tuned.daemon.daemon: static tuning from profile 'openshift-profile' applied
...
I0822 08:31:04.221247   43551 controller.go:965] changeSyncerPostReloadOrRestart(): current effective profile fingerprint "57bf1d12635adc2a15c046890cf1d675630b2148fae2cf0231eea83c98fe66b6" -> "46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f"
...
I0822 08:31:04.235566   43551 controller.go:1369] daemonStatus(): isApplied -> true: fpUnpacked=46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f fpEffective=46cc8c1e9bcfc1ff5635ea3065a70f3fda4deec61a59a42c7c5586291bae3c3f
I0822 08:31:04.235585   43551 controller.go:1372] daemonStatus(): change: deferred=true applied=true nodeRestart=false
I0822 08:31:04.235592   43551 controller.go:1386] computed status conditions: []v1.ProfileStatusCondition{v1.ProfileStatusCondition{Type:"Applied", Status:"True", LastTransitionTime:time.Date(2024, time.August, 22, 6, 37, 35, 0, time.Local), Reason:"AsExpected", Message:"TuneD profile applied."}

@MarSik MarSik changed the title Fix defer status OCPBUGS-38795: Fix defer status during recommended profile change Aug 22, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 22, 2024
@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-38795, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Process recommended profile change even when the profile itself has not changed itself. The internal profile fingerprint tracking was not updated during recommended profile update with no internal changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor Author

MarSik commented Aug 22, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Aug 22, 2024
@openshift-ci-robot
Copy link
Contributor

@MarSik: This pull request references Jira Issue OCPBUGS-38795, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (liqcui@redhat.com), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 22, 2024
@MarSik MarSik force-pushed the fix-defer-status branch 3 times, most recently from c6fd3da to 712316c Compare August 22, 2024 11:08
Process recommended profile change even when the profile itself
has not changed itself. The internal profile fingerprint tracking
was not updated during recommended profile update with no internal
changes.
@MarSik
Copy link
Contributor Author

MarSik commented Aug 26, 2024

/retest-required

Copy link
Contributor

openshift-ci bot commented Aug 26, 2024

@MarSik: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@MarSik
Copy link
Contributor Author

MarSik commented Aug 26, 2024

/label acknowledge-critical-fixes-only

We need this in time for 4.17 and all the tests seem to be passing.

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Aug 26, 2024
@jmencak
Copy link
Contributor

jmencak commented Aug 26, 2024

/cc @jmencak

@openshift-ci openshift-ci bot requested a review from jmencak August 26, 2024 10:02
@@ -1029,6 +1029,7 @@ func (c *Controller) changeSyncerTuneD(change Change) (synced bool, err error) {
// Cache the value written to tunedRecommendFile.
c.daemon.recommendedProfile = change.recommendedProfile
klog.V(1).Infof("recommended TuneD profile updated from %q to %q [inplaceUpdate=%v nodeRestart=%v]", prevRecommended, change.recommendedProfile, inplaceUpdate, change.nodeRestart)
changeRecommend = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the variable not being set anywhere in this function. However, can we possibly get rid of the variable completely? Would it work if we replaced it further down just by change.nodeRestart ? I haven't tested this (yet), but looking at the code it should work. It would also resolve the slight mis-placement of the variable setting below the logging message and its not very intuitive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, but there are two separate triggers for this flow:

  • !inplaceUpdate -- the recommended profile was changed for a live system
  • change.nodeRestart -- first execution after reboot

The !inplaceUpdate is actually the case that was missed and caused the bug.

Both variables are later used for flow branching and the changeRecommend bool is the combination of both. I agree it is not easily visible or understandable, but I was a bit afraid to touch the logic tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right. I was just testing that and my suggestion didn't work. I agree !inplaceUpdate causes that issue.

I was a bit afraid to touch the logic tbh.

Understood. Well, I wonder if the entire flow needs a bit of cleanup / adding comments once @ffromani is back. In the meantime, I guess we can merge if this fixes the issue.

@jmencak
Copy link
Contributor

jmencak commented Aug 26, 2024

/approve
/lgtm
I can confirm this fixes https://issues.redhat.com/browse/OCPBUGS-38795 . (Not https://issues.redhat.com/browse/OCPBUGS-38796).
However, I'd appreciate some follow-up cleanup/comments in the affected deferred updates flow.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2024
Copy link
Contributor

openshift-ci bot commented Aug 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak, MarSik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit d3cc082 into openshift:master Aug 26, 2024
16 checks passed
@openshift-ci-robot
Copy link
Contributor

@MarSik: Jira Issue OCPBUGS-38795: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-38795 has been moved to the MODIFIED state.

In response to this:

Process recommended profile change even when the profile itself has not changed itself. The internal profile fingerprint tracking was not updated during recommended profile update with no internal changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: cluster-node-tuning-operator
This PR has been included in build cluster-node-tuning-operator-container-v4.18.0-202408261944.p0.gd3cc082.assembly.stream.el9.
All builds following this will include this PR.

yanirq pushed a commit to yanirq/cluster-node-tuning-operator that referenced this pull request Sep 1, 2024
…enshift#1142)

* OCPBUGS-38795: Fix defer status during recommended profile change

Process recommended profile change even when the profile itself
has not changed itself. The internal profile fingerprint tracking
was not updated during recommended profile update with no internal
changes.

* Add e2e test for defer=update
openshift-merge-bot bot pushed a commit that referenced this pull request Oct 2, 2024
)

* OCPBUGS-28647: tuned: distinguish deferred updates (#1129)

* OCPBUGS-28647: tuned: distinguish deferred updates

To fully support the usecase described in OCPBUGS-28647 and fix
the issue, we need to further distinguish between first-time
profile change and in-place profile change. This is required to
better support a GitOps flow.

The key distinction is if the recommended profile changes or not,
and there's a desire to defer application of changes only when a
profile is updated (e.g. sysctl modified), not the first time it
is applied.

Thus:
- first-time profile change is a change which triggers a change
  of the recommended profile.
- in-place profile update is a change which does NOT cause a
  switch to a TuneD profile with a different name. This involves
  changes to only the contents of the currently used profile.

We change the way the annotation is used. We now require a value,
which can be either
- always: every Tuned object annotated this way will have its
  application deferred.
- update: every Tuned object annotated this way will be processed
  as usual (and as it wasn't annotated) if it's a first-time profile
  change, but its in-place updates will be deferred.
- a new internal value "never" is also added to be used internally
  to mean the deferred feature is disabled entirely.
  User can use this value but it will explicitly disable the feature
  (which is disabled already by default), thus is redundant and not
  recommended.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: drop tags, use labels

now that we have the more powerful ginkgo labels,
we can stop using tags for newer tests.

Signed-off-by: Francesco Romani <fromani@redhat.com>

---------

Signed-off-by: Francesco Romani <fromani@redhat.com>

* OCPBUGS-38795: Fix defer status during recommended profile change (#1142)

* OCPBUGS-38795: Fix defer status during recommended profile change

Process recommended profile change even when the profile itself
has not changed itself. The internal profile fingerprint tracking
was not updated during recommended profile update with no internal
changes.

* Add e2e test for defer=update

* NO-JIRA: deferred updates: fix in-place update handling on reboot (#1162)

* log: make sure to have high verbosity in tests

Signed-off-by: Francesco Romani <fromani@redhat.com>

* e2e: deferred: stricter testing

add stricter check for the node condition when
validating OCPBUGS-38795

Signed-off-by: Francesco Romani <fromani@redhat.com>

* deferred: tuned: fix reload trigger when inplace update

There are conditions on which we should not set the
reload flag. This avoid regression in the test
"Profile deferred when applied should trigger changes when applied fist,
then deferred when edited, if tuned restart should be kept deferred"

Signed-off-by: Francesco Romani <fromani@redhat.com>

* deferred: tuned: clarify comments and logs

document better the logic about processing edits

Signed-off-by: Francesco Romani <fromani@redhat.com>

---------

Signed-off-by: Francesco Romani <fromani@redhat.com>

* Omit unbackported logging support

---------

Signed-off-by: Francesco Romani <fromani@redhat.com>
Co-authored-by: Francesco Romani <fromani@redhat.com>
Co-authored-by: Martin Sivák <msivak@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants