Ingress controller should react to node scale down event from autoscaler #595

freehan · 2019-01-04T22:21:45Z

Cluster autoscaler adds the ToBeDeletedByClusterAutoscaler taint on the candidate node. Then it goes on to evict pods from the nodes. After everything settles, it will delete the node.

Ingress controller should observe the taint and react by removing instance from instance group so that connection draining is triggered. This help avoid new connection to be arrive at the removing nodes which causes 502s.

retpolanne · 2019-05-30T00:53:42Z

Hi, I would like to grab this issue, if no one is working on it already.

retpolanne · 2019-06-09T12:08:01Z

@freehan I might be looking at it wrong, but I think the problem is not on ingress-gce itself.

Over here, on GetNodeConditionPredicate (which is used to filter nodes during sync), all Unschedulable nodes are marked out of the filter.

However, over here the node is only tainted as NoSchedule.

IMO the autoscaler should also mark the node as Unschedulable. I'm writing a few tests on the autoscaler for that.

freehan · 2019-07-10T20:19:13Z

Thanks for the PR!

freehan · 2019-07-10T20:22:05Z

Yes. As you pointed out, the problem is the handshake between autoscaler and ingress-gce (or any other loadbalancer controller). This required a better design for synchronization between cluster node life cycle and load balancer controller.

But for now, as a stop gap, we just need to watch autoscaler's taint and react.

philpearl · 2019-07-15T14:46:03Z

Would really like this fix in GKE - can anyone comment on how long it will take before it's available?

freehan · 2019-07-18T21:08:20Z

We will cherry pick this into 1.6 branch. And it will follow the GKE release pipeline and possibly available in newer version of 1.13.8+

freehan added the kind/bug Categorizes issue or PR as related to a bug. label Jan 4, 2019

rramkumar1 assigned freehan Jan 14, 2019

rramkumar1 added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Mar 28, 2019

retpolanne mentioned this issue Jul 7, 2019

Added NoSchedule effect to GetNodeConditionPredicate #792

Merged

k8s-ci-robot closed this as completed in #792 Jul 12, 2019

MaciekPytel mentioned this issue Nov 3, 2020

Adding functionality to cordon the node before destroying it. kubernetes/autoscaler#3649

Merged

innobead mentioned this issue Apr 19, 2022

Support Kubernetes cluster-autoscaler longhorn/longhorn-manager#1299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress controller should react to node scale down event from autoscaler #595

Ingress controller should react to node scale down event from autoscaler #595

freehan commented Jan 4, 2019 •

edited

Loading

retpolanne commented May 30, 2019 •

edited

Loading

retpolanne commented Jun 9, 2019

freehan commented Jul 10, 2019

freehan commented Jul 10, 2019

philpearl commented Jul 15, 2019

freehan commented Jul 18, 2019

Ingress controller should react to node scale down event from autoscaler #595

Ingress controller should react to node scale down event from autoscaler #595

Comments

freehan commented Jan 4, 2019 • edited Loading

retpolanne commented May 30, 2019 • edited Loading

retpolanne commented Jun 9, 2019

freehan commented Jul 10, 2019

freehan commented Jul 10, 2019

philpearl commented Jul 15, 2019

freehan commented Jul 18, 2019

freehan commented Jan 4, 2019 •

edited

Loading

retpolanne commented May 30, 2019 •

edited

Loading