-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement kubernetes probes #542
Comments
/help |
@serathius: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I am interested to pick this up. @serathius Is my understanding correct ? We need to implement
Questions
|
/cc @logicalhan |
Oh thanks :)
"/healthz" is actually deprecated in favor of "/livez" now.
Yes, there are standard kubernetes configuration options. You can set
This is configurable as well, but generally you want readiness endpoints to fail fast, since it's usually used for load-balancing.
The apiserver doesn't scrape the metrics-server liveness/readiness probes, this is actually a function of the kubelet. |
The design mentions that the Wondering if above point is relevant at all |
Liveness of container should not depend on external services (assuming that container can recover from network problems). Apiserver being down should not cause a MS restart (it behaves like that currently and this is not correct). Here proposed definition requires at least one successfully scraped node, meaning that it should not fail on Apiseriver down if we discovered at least one node. Possible cases when MS don't have nodes to scrape causing liveness check to fail:
We could change this behavior, but I'm little worried about it making MS ready to early in startup. I would prefer to check if those problems happen in real clusters before trying to solve them.
Thanks for info, we should change design to use
I think that Hanu meant the period of scraping metrics set in Metrics Server (not the probe period). Answer is yes, scrape period can be changed via a command line flag. Proposed definition of liveness here is to check if this operations is executed on time (+/- 50% to avoid restarting due temporary overload). Value of liveness is independent to period of probes that check it.
For readiness there is a slightly different working (last scrape not a scrape within 30s). We should check for success in only last scrape attempt and not care about the time. |
Hey @hanumanthan, |
Hi @serathius I would love to but given my personal circumstances, I wont be able to work on this for a while. |
np, thanks for confirmation. |
Both probes were implemented. Implementation of readyz skipped checking apiserver, if that's a problem we can revisit it in future. |
/kind feature
Goal of this issue is to create proper definitions of health and readiness for metrics server and implement Kubernetes probes based on them.
Background
Code in metrics server does two following processes:
Some problems related:
Things to consider:
Proposal
Definitions:
Reasoning:
Things to consider:
TODO
We should update/add implementation of livez and readyz based on new definitions. Checking if apiserver is available should be as light as possible (e.g. check if TCP connection is open). In addition we should add/update probes, default configuration (no delay, every 10s with 3 fails threshold) looks ok for now.
Alternatives
No need to implement
Startup
probe as we don't fall under https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-startup-probe/cc @s-urbaniak
The text was updated successfully, but these errors were encountered: