[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

fullykubed · 2024-09-04T01:02:46Z

Prior Search

I have already searched this project's issues to determine if a bug report has already been made.

What happened?

Kube-fledged periodically runs pods on every node that attempt to pull images to ensure that node's image cache is up to date. This runs every 3 minutes in the current stack configuration.

However, while these pods are running, Karpenter cannot disrupt the nodes because the kube-fledged pods are bound to their nodes and cannot be rescheduled on different nodes (a requirement of karpeneter scale-down). Since kube-fledged runs so often, these often leaves Karpenter perpetually unable to disrupt nodes.

The challenge is that the kube-fledged sync does not run automatically on new node creation so unless the sync runs often its possible a node might not have images in its image cache when needed.

Personally, it seems like we might need to fork kube-fledged to add this capability since the project seems relatively unmaintained.

Steps to Reproduce

Default behavior of the stack. Simply observe.

Relevant log output

not all pods would schedule, linkerd/linkerd-proxy-czh6d-7h5r4 => incompatible wit │
│ h nodepool "spot-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kubernetes.io/hostname │
│ , kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5098]; incompati │
│ ble with nodepool "spot", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kubernetes.io/hostn │
│ ame, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5099]; incomp │
│ atible with nodepool "burstable-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kuberne │
│ tes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5 │
│ 00[]; incompatible with nodepool "burstable", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key │
│  kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-place │
│ older-5101[]; incompatible with nodepool "on-demand-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requi │
│ rements, key kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [ho │
│ tname-placeholder-5102[]; incompatible with nodepool "on-demand", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatib │
│ le requirements, key kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostnam │
│ e In [hostname-placeholder-5103]

wesbragagt · 2024-09-19T16:24:49Z

@fullykubed would this increase cost for users running kube_fledged?

fullykubed · 2024-09-24T12:12:36Z

@wesbragagt I am still collecting cost data to determine the impact, but I suspect it has an impact.

For this and a few other reasons, we are likely to going to fork the kube-fledged project and manage a custom version ourself that plays nicer with modern cluster components (kube-fledged is unmaintained it seems). Our goal is to have that integrated by the next stable release.

fullykubed added the bug Something isn't working label Sep 4, 2024

fullykubed self-assigned this Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

fullykubed commented Sep 4, 2024 •

edited

Loading

wesbragagt commented Sep 19, 2024

fullykubed commented Sep 24, 2024

[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

Comments

fullykubed commented Sep 4, 2024 • edited Loading

Prior Search

What happened?

Steps to Reproduce

Relevant log output

wesbragagt commented Sep 19, 2024

fullykubed commented Sep 24, 2024

fullykubed commented Sep 4, 2024 •

edited

Loading