-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-33620: define *probes for KSM #2352
base: master
Are you sure you want to change the base?
Conversation
@rexagod: This pull request references Jira Issue OCPBUGS-33620, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
}, | ||
], | ||
local livenessProbePath = 'healthz', | ||
local readinessProbePath = '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rexagod The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/jira refresh |
@juzhao: This pull request references Jira Issue OCPBUGS-33620, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exec probes have lots of edge cases and I'd rather stay away from them.
Also are we convinced that a liveness probe would have caught the network issue?
There were a couple of asks in the original ticket.
I believe for the second case, exec probes may be the only way since kubelet pings are blocked off by KRP. In other cases, we've used HTTP-based probes for the outer KRP instance, but a non- For the first case, I believe we should keep the endpoints here as they are defined upstream, but make sure that IMHO I agree and would prefer HTTP-based approach over this given the edge cases, but this seems to check all the asks, unless I'm missing something here. |
AFAICT /healthz always return 200 OK. I'd be on the fence adding checks against Kubernetes API to the /healthz endpoint: strictly speaking kube-state-metrics is still alive even if it can't reach the API. But it's an upstream discussion anyway. |
Right, and to make this a non-breaking change, would it make sense to add outage checks to a |
Checking against what metrics-server does:
Even to avoid a breaking change, I'm not sure that you want to swap the semantics of /livez and /healthz? |
Hmm, there seems to be a similar pattern (eg., But for out-of-tree non- I believe this pattern would warrant for including outage checks within WDYT? |
(PS. I might be wrong here, but I wanted to confirm my deduction to devise my approach to resolve this upstream) |
0d8a1d8
to
312fa45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above, I've been burnt too many times by exec probes. Can we configure kube-rbac-proxy to disable authn/authz for probes?
command: | ||
- sh | ||
- -c | ||
- if [ -x "$(command -v curl)" ]; then exec curl -s -I -f http://localhost:8081/livez; elif [ -x "$(command -v wget)" ]; then exec wget --quiet --tries=1 --spider http://localhost:8081/livez; else exit 1; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/livez returns 200 OK currently because the HTTP server returns the home URL. I'd rather have the /livez endpoint implemented before having the change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
livenessProbe: | ||
httpGet: | ||
path: livez | ||
port: 8443 | ||
scheme: HTTPS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(point livenessProbe
for the KSM container to it's respective proxied port, this helps us actually restart the container on a probe failure)
7c801ea
to
2e35d00
Compare
/test e2e-aws-ovn-techpreview |
kubernetes/kube-state-metrics#2418 (comment) is in, and will be included in kubernetes/kube-state-metrics#2419 (v2.13.0). |
Pinging @simonpasquier for one last review here to make sure everything looks good before we release KSMv2.13.0, so all that will be left after that would be merging this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how the e2e tests pass given that the /livez endpoint doesn't exist.
@rexagod do we have other code change for this PR? |
@juzhao Since https://github.com/openshift/cluster-monitoring-operator/pull/2352/files#r1664732654 is in, I'll update the PR. :) |
'--ignore-paths=' + std.join(',', if std.endsWith(c.name, '-self') then ['/metrics'] else ['/livez']), | ||
'--ignore-paths=' + std.join(',', if std.endsWith(c.name, '-self') then ['/readyz'] else ['/livez']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
|
@rexagod: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we propagate the probes enhancement to kube-prometheus first?
Currently, |
Unfortunately prometheus-operator/kube-prometheus#436 lacks context about the issue being addressed. If CMO can deploy kube-state-metrics with liveness/readiness probes, I see no reason why it wouldn't be beneficial for kube-prometheus too. We shoould be able to find a way which is backward compatible (e.g. probes disabled by default but they can be opted in). |
Opened a PR here: prometheus-operator/kube-prometheus#2505. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Will set a target version post-4.16 branching.