Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LivenessProbes not working #26398

Closed
sebidude opened this issue May 24, 2024 · 9 comments
Closed

LivenessProbes not working #26398

sebidude opened this issue May 24, 2024 · 9 comments
Assignees
Labels
etcd solved stale 15 days without activity tech-issues The user has a technical issue about an application

Comments

@sebidude
Copy link

sebidude commented May 24, 2024

Name and Version

bitnami/etcd 10.1.0

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Install the chart with TLS stuff enabled in the values.yaml
auth:
rbac:
  create: false
token:
  type: simple
client:
  secureTransport: true
  existingSecret: "etcd-client-certs"
  enableAuthentication: true
  caFilename: "ca.crt"
peer:
  secureTransport: true
  useAutoTLS: true
  caFilename: "ca.crt"
  1. Pods restart after some time and the etcd cluster is in a really bad state

What is the expected behavior?

The cluster should be up an running stable

What do you see instead?

Pods restart after a short time due to failing http livenessProbes on PodIP:2379/health

Additional information

This seems to be related to #25984 where the livenessProbes changed.

@sebidude sebidude added the tech-issues The user has a technical issue about an application label May 24, 2024
@github-actions github-actions bot added the triage Triage is needed label May 24, 2024
@github-actions github-actions bot removed the triage Triage is needed label May 24, 2024
@github-actions github-actions bot assigned fmulero and unassigned carrodher May 24, 2024
@kaykhan
Copy link

kaykhan commented May 24, 2024

I've also just encountered this. Both readiness probe and liveness probe are failing after installing nginx helm install my-nginx bitnami/nginx --version 17.2.1

I have a Kubernetes EKS cluster which is only IPV6. Curious if you have a similar setup, i wonder if its to do with the liveness and readiness probe are not configured correctly for ipv6

@sebidude
Copy link
Author

I have a Kubernetes EKS cluster which is only IPV6. Curious if you have a similar setup, i wonder if its to do with the liveness and readiness probe are not configured correctly for ipv6

We run self-managed K8s Clusters on-prem and on cloud infrastructure. This was failing in a dev stage cluster. IPv4 only.
The only thing which was changed was the liveness probes. For now we just rolled back to 10.0.11

@danielb43
Copy link

bitnami/etcd 10.1.1 is also affected by the original issue.

@ismaildem
Copy link

I think the problem here is that the incorrect port is being queried.
The endpoint livez is available via the metrics port and exclusively via http.
So if you have metrics.useSeparateEndpoint enabled, the liveness probe must use the port defined by .Values.containerPorts.metrics

@BobVanB
Copy link

BobVanB commented Jun 6, 2024

I have the same problem when disabling rbac and only use client authentication with certificates.
The probe https://<>:2379/livez is not going through, because there is no client certificate passed.

There is a simple workaround until this is fixed inside the template:

customLivenessProbe:
  httpGet:
    port: 9090
    path: /livez
    scheme: HTTP
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 5

metrics:
  useSeparateEndpoint: true

@fmulero
Copy link
Collaborator

fmulero commented Jun 14, 2024

Thanks a lot @BobVanB

To be perfectly blunt I don't see an easy solution if we kept the /livez endpoint and that change was intentional. Do you have any proposal in mind? Please feel free to open a PR.

@BobVanB
Copy link

BobVanB commented Jun 14, 2024

Hi @fmulero

I'm not going to touch this topic any further. There is enough discussion about this.
For example: etcd-io/etcd#16007
It would be nice if we got this: etcdctl endpoint live

With kind regards,

Copy link

github-actions bot commented Jul 2, 2024

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Jul 2, 2024
Copy link

github-actions bot commented Jul 7, 2024

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

@github-actions github-actions bot added the solved label Jul 7, 2024
@bitnami-bot bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
etcd solved stale 15 days without activity tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

8 participants