Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backoff for k8s watcher retries #506

Merged
merged 3 commits into from
Aug 8, 2023

Conversation

cbgbt
Copy link
Contributor

@cbgbt cbgbt commented Aug 8, 2023

Issue number:
#478

Description of changes:
The kube::runtime provides us utilities for watching for events from Kubernetes watch APIs and updating an internal cache of state, which is much more efficient than polling the k8s API for changes.

If the brupop CRD is not installed, these watchers end up in an extremely tight retry loop to begin watching.

This PR:

  • Updates kube to get access to some nice default backoff functionality that was added in a recent release.
  • Uses aforementioned backoff functionality
  • Unrelatedly, lets the controller sleep for a longer fixed interval if there is nothing to do (which is true if the CRD is not installed)

Testing done:
Deployed the changes to a cluster without the CRD installed.

Prior to this change, the agent logs were, several times per second, spammed with messages like this:

2023-08-08T20:23:28.561457Z  WARN kube_client::client: Unsuccessful data error parse: 404 page not found

    at /src/.cargo/registry/src/index.crates.io-6f17d22bba15001f/kube-client-0.85.0/src/client/mod.rs:445

After this change, these messages are clearly backed-off exponentially, with a short initial burst followed by longer waits of 15+ seconds.

The controller also emits this log every 10 seconds:

2023-08-08T20:24:59.897841Z  INFO controller::controller: Nothing to do: The bottlerocket-update-operator is not aware of any BottlerocketShadow objects. Is the bottlerocket-shadow CRD installed? Are nodes labelled so that the agent is deployed to them? See the project's README for more information.
    at controller/src/controller.rs:285

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

if self.all_brss().is_empty() {
event!(
Level::INFO,
"Nothing to do: The bottlerocket-update-operator is not aware of any BottlerocketShadow objects. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@cbgbt cbgbt merged commit 591aab0 into bottlerocket-os:develop Aug 8, 2023
2 checks passed
@cbgbt cbgbt deleted the watcher-backoff branch August 10, 2023 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants