-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add autoscaler for kube-state-metrics #200
Conversation
89a11dd
to
68cf172
Compare
I think since the nanny scales resources linearly with the number of nodes in the cluster, I can use small starting resource for kube-state-metrics container. Here I use request resource for resource base requirement in nanny. |
Having never used the nanny pod, this might be a incorrect, but I'd expect it to change the deployment manifest, meaning it needs the RBAC roles for it, no? |
I looked at the addon-resizer code and from what I can tell it needs:
What that would basically mean is that we need to add the last two points to our RBAC roles. |
Good point, but I don't add the last two, it works... and the example in https://github.com/kubernetes/autoscaler/tree/master/addon-resizer |
Depending on the @piosz @loburm are you aware of the RBAC roles the addon-resizer needs? From looking at the code it doesn't seem to do anything special so I'd expect it to require it. |
@brancz I started a new namespace and deployed pod-nanny, it complains it can't get pod and update deployment due to limited permission.So maybe is it due to kube-system(a special ns)? |
Yes that was my suspicion. Would you care to adapt? 🙂 |
you mean add
in this file https://github.com/kubernetes/kube-state-metrics/blob/master/kubernetes/kube-state-metrics-cluster-role.yaml? |
Yes, but also get for deployments, as currently it's just list/watch. |
yeah, exactly what i want to say due to this func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit, then lgtm
- deployments | ||
verbs: ["list", "watch", "get", "update"] | ||
- apiGroups: ["extensions"] | ||
resources: | ||
- daemonsets | ||
- deployments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be removed as it's redundant with the above rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry will do
fixed PTAL |
I read the doc https://kubernetes.io/docs/admin/authorization/rbac/, and find that "Default RBAC policies grant scoped permissions to control-plane components, nodes, and controllers, but grant no permissions to service accounts outside the “kube-system” namespace (beyond discovery permissions given to all authenticated users)." |
We actually run kube-state-metrics not in the kube-system namespace, so we don't really have that problem, or rather therefore we do have this very problem. Yes |
I think it should be in kube-system, no need to change IMHO |
Agreed, but it's still up to the user, so RBAC roles should be properly documented. |
@kubernetes/sig-api-machinery-pr-reviews for rbac stuff |
Also we use pod-nanny with Heapster already. I'm not sure where we can find rbac rules for it. FYI @kubernetes/sig-autoscaling-misc |
- deployments | ||
verbs: ["list", "watch", "get", "update"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Letting this update any replicaset and daemonset in the system doesn't seem appropriate. I'd expect those permissions limited to the namespace and specific replicaset/daemonset it is responsible for. Also, the update permission belongs in a separate role specific to the nanny, since someone who is not using the nanny should not need to give write permission to their metrics pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replicaset and daemonset are just given list and watch permission, no update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, misread the diff. Same comment applies to deployments re update permission just on this particular deployment in this namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we need to separate the list/watch from the get and update as list/watch needs to be on all namespaces for the purpose of collecting the objects to produce the metrics of those. Thus get and update only need to be on that specific namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we need to separate the list/watch from the get and update as list/watch needs to be on all namespaces for the purpose of collecting the objects to produce the metrics of those. Thus get and update only need to be on that specific namespace.
exactly, in a Role in the kube-system
namespace, bound with a RoleBinding
. the rule should also specify resourceNames:["kube-state-metrics"]
to only allow modifying the relevant deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think maybe it's a little too tight, from the code we can see that pod-nanny only get and update the relevant deployment we specified. I think we can trust it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's less about not trusting the code/binary of the addon-resizer, than an attacker being able to modify arbitrary deployments. (aka principle of least privilege) Which as a side note makes a great example of why we should not run kube-state-metrics in the kube-system namespace.
So I will submit another commit to fix the rbac issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion on patch. LGTM otherwise
resources: | ||
- deployments | ||
resourceNames: ["kube-state-metrics"] | ||
verbs: ["get", "update"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to allow patch as well… not sure how the nanny updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the nanny updates is using https://github.com/kubernetes/autoscaler/blob/master/addon-resizer/nanny/kubernetes_client.go#L69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just using get and update
@liggitt another question here: If I already have a role to give "list pod" permission, is it right that I already have "get a specific pod" permission? |
@brancz I found that we actually don't need to add "get pods" permission even if we deploy kube-state-metrics all in another fresh-new namespace.Like below, I deploy it in monitor namspace
Tested on my gke (k8s version 1.7.2) |
I would expect this to cause errors at runtime, did you have a look at the logs of the addon-resizer? The actions should be distinct, also we should trigger the case that autoscaling kicks in, to see that it actually works properly. |
@ericchiang @liggitt is this intended, that get/list RBAC actions are both allowed when only one of them is specified? |
That is not the case. The clusterrole already grants list pods to the service account, so anything running with that service account would have list permission. If the resizer is doing a get of a pod, either RBAC (or another configured authorizer) is allowing an explicit get. Did you set up your gke cluster with legacy abac permissions disabled? |
@liggitt no extra config, I just use default setting |
Do you know how to get apiserver start configuration on gke? I can't find it anywhere, maybe google hide it... |
I don't think it's accessible. Did you start your cluster passing |
@LiGgit, I check the configuration, it is "Legacy authorization |
Happy we could figure it out, lgtm now 👍 . I'll hold off on a final lgtm from @liggitt though. |
seems fine as a starting point. as a test, I'd recommend deploying the addon-resizer in its own pod with its own service account, and seeing if the role is sufficient. |
@liggitt If we deploy addon-resizer in its own pod with its own sa, we also need a cluster role for list nodes permission due to it must get the cluster node number to determine whether to scale or not. Tested on my cluster with this. Now it is using the kube-state-metrics clusterrole to work. |
that's worth noting, but given the Role for the addon-resizer is pretty specific to kube-state-metrics, granting the kube-state-metrics clusterrole to the resizer wouldn't be terrible. the important thing is to keep the deployment mutation permissions separate so that someone running without the addon-resizer wouldn't be required to give their metrics pod permission to modify the deployment. this PR does that, so I think it's fine as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nits on the naming
@@ -30,4 +30,30 @@ spec: | |||
limits: | |||
memory: 50Mi | |||
cpu: 200m | |||
|
|||
- name: pod-nanny |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the container image is called addon-resizer, let's call the container the same
kind: Role | ||
metadata: | ||
namespace: kube-system | ||
name: pod-nanny |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, I'd like to change this to something more expressive to what it is, maybe something like: s/pod-nanny/kube-state-metrics-resizer/
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: pod-nanny |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to my suggestion to change the name of this, we need to adapt here as well
fixed @brancz ptal |
tested ok on my cluster |
Great thanks! I think we can move forward and get 1.0 out! |
add autoscaler for kube-state-metrics
For #124
(provide deployment manifest that scales with cluster size using pod nanny)
This setup is tested on my gke cluster, now kube-state-metrics pod has two container, kube-state-metrics and pod-nanny
The cpu and memory parameters can be adjusted too.
This change is