GCE health check does not pick up changes to pod readinessProbe #39

bowei · 2017-10-11T17:50:34Z

From @ConradIrwin on April 11, 2017 4:22

Importing this issue from kubernetes/kubernetes#43773 as requested.

Kubernetes version (use kubectl version): 1.4.9

Environment:

Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

What happened:

I created an ingress with type GCE

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: admin-proxy
  labels:
    name: admin-proxy
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
    - secretName: ingress-admin-proxy
  backend:
    serviceName: admin-proxy
    servicePort: 80

This set up the health-check on the google cloud backend endpoint to call "/" as documented.

Unfortunately my service doesn't return 200 on "/" and so I added a readinessCheck to the pod as suggested by the documentation.

What you expected to happen:

I expected the health check to be automatically updated.

Instead I had to delete the ingress and re-create it in order for the health-check to update.

How to reproduce it (as minimally and precisely as possible):

Create a deployment with no readiness probe.
Create an ingress pointing to the pods created by that deployment.
Add a readiness probe to the deployment.

Anything else we need to know:

Copied from original issue: kubernetes/ingress-nginx#582

The text was updated successfully, but these errors were encountered:

bowei · 2017-10-11T17:50:35Z

From @ConradIrwin on April 11, 2017 4:22

Comment by @nicksardo:

There's actually a TODO for deciding what to do in this case https://github.com/kubernetes/ingress/blob/6ed6475e87588494dd29ba5d33bac167ae8acfdd/controllers/gce/healthchecks/healthchecks.go#L71 The controller doesn't want to overwrite handmade changes. We could probably update the settings if they have the default value.

bowei · 2017-10-11T17:50:36Z

From @porridge on April 11, 2017 7:45

But updating from default value would lead to confusing behavior where only
the first change is propagated, and it's not even possible to go back to
default, by changing the ingress, right?

I think it's reasonable to assume that if the ingress controller knows how
to manage a setting, it will overwrite manual changes.

bowei · 2017-10-11T17:50:36Z

From @nicksardo on April 11, 2017 16:23

I agree that it's reasonable and would be an easy solution. However, I worry about unsuspecting users who upgrade their master and find their load balancer throwing 502s. Would creating an event on health check update be sufficient? Thoughts from other users?

bowei · 2017-10-11T17:50:36Z

From @ConradIrwin on April 11, 2017 16:31

Is there any way that the "ingress controller"? can track whether the value was set automatically?

bowei · 2017-10-11T17:50:37Z

From @nicksardo on April 11, 2017 16:56

The controller does not have persistent storage. One option would be to convert the health check settings to JSON and store the state in the HealthCheck's optional description field for tracking diffs, but this seems very ugly.

bowei · 2017-10-11T17:50:37Z

From @nicksardo on April 11, 2017 16:56

cc @ibawt

bowei · 2017-10-11T17:50:37Z

From @porridge on April 11, 2017 16:57

What do you mean by "creating an event on health check update"?

bowei · 2017-10-11T17:50:38Z

From @nicksardo on April 11, 2017 17:10

If the controller sees a health check with settings different from default/readinessProbe, it updates the check then could create a K8s event for each parent ingress. If readiness probe exists, event could say, "Health Check for port 3XXXX updated to readinessProbe settings". If probe doesn't exist, "Health check for port 3XXXX updated to default settings (see ingress docs for defining settings via readinessProbe)".
There's already an event being created at HC creation, but it doesn't seem very useful to me: https://github.com/kubernetes/ingress/blob/987540f8f61f86d07314cd776fe32e150724b48c/controllers/gce/controller/utils.go#L603

bowei · 2017-10-11T17:50:39Z

From @ConradIrwin on April 11, 2017 17:21

Just my 2c: As a kubernetes user, I have no idea how to use events — they
seem too transient to be useful (i.e. it's impossible to see when they
actually happened, and I've never seen an event that's helped me to run my
cluster better).

It might just be a UI problem — if there was a way of just tailing the
event logs, then I might comprehend them better, so maybe it's ok to add
one and assume that they'll be fixed upstream

Sent via Superhuman https://sprh.mn/?vip=conrad.irwin@gmail.com

On Tue, Apr 11, 2017 at 10:10 AM, Nick Sardonotifications@gitpro.ttaallkk.topwrote:

If the controller sees a health check with settings different from
default/readinessProbe, it updates the check then could create a K8s event
for each parent ingress. If readiness probe exists, event could say,
"Health Check for port 3XXXX updated to readinessProbe settings". If probe
doesn't exist, "Health check for port 3XXXX updated to default settings
(see ingress docs for defining settings via readinessProbe)".
There's already an event being created at HC creation, but it doesn't seem
very useful to me: https://github.com/kubernetes/ingress/blob/
987540f/controllers/gce/controller/utils.
go#L603

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
kubernetes/ingress-nginx#582 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFwQP-D0IkTqISjn40vKGBNilNOffRjks5ru7QNgaJpZM4M5mnb
.

bowei · 2017-10-11T17:50:40Z

From @kilianc on May 14, 2017 21:36

Is there any workaround to make it pick up the changes, even manually?

bowei · 2017-10-11T17:50:40Z

From @kilianc on May 14, 2017 21:50

To answer my own question and for everybody landing here with the same problem, rotate the name of the service in your rule or backend and the controller will pick up the path correctly. I assume this could be done with 0 downtime creating a new rule and deleting the old one.

bowei · 2017-10-11T17:50:40Z

From @bviolier on June 8, 2017 7:26

We actually encountered an issue yesterday in regards to readinessProbes and health checks on GCE. A while ago we had some difficulties with setting up the health-check correctly through Kubernetes, adding that it doesn't pick up changes, we decided to manually update the health-check on the GCE side, everything worked.
Yesterday we updated our master to 1.6.4, and suddenly all manual set health-check updates were reset to the defaults, probably because Kubernetes (re)set them again.

But, reading this issue, it should never update health-checks. I cannot re-trigger an update now, so it doesn't seem that this functionality has been added. But for sure the master upgrade did trigger this, is that expected behavior?

bowei · 2017-10-11T17:50:41Z

From @bviolier on June 8, 2017 7:34

Another gotcha, when using the example provided, https://github.com/kubernetes/ingress/blob/6ed6475e87588494dd29ba5d33bac167ae8acfdd/controllers/gce/examples/health_checks/health_check_app.yaml, if you remove the ports.containerPort, the GCE ingress health check sets defaults, and can't process the readinessProbe. This might be expected behavior, but at least for us, it wasn't.

All our reset health-checks didn't have a ports.containerPort set.

bowei · 2017-10-11T17:50:41Z

From @nicksardo on June 8, 2017 15:34

@bviolier

Regarding your last comment, containerPort is a required attribute for defining a port of a container. How would you expect to serve network traffic without defining it?

Regarding your loss of manual health check settings, the latest version of the controller creates a newer-type of health check (compute.HealthCheck instead of the legacy compute.HTTPHealthCheck). This chance was made to make HTTP and HTTPS health checks more manageable by the controller and to make future features of the load balancer easier to consume.

What settings are you trying to override with the readiness probe? Just the path?

bowei · 2017-10-11T17:50:42Z

From @bviolier on June 8, 2017 15:52

@nicksardo Thanks for the information!

I was not aware that the containerPort was a required attribute. But, the pod/services also worked fine without it. The health-check was also working (although after manual editing the path), so traffic could get into the pods through the port specified in the readinessProbe, and without the containerPort.

Makes sense if it created a new type of health-check, thanks for the explanation. I do think more people that manually edited their health checks will run into this issue, but, manually editing health checks is also not the best thing to do ;-)

The most important changes we make with readinessProbe is indeed the path. But occasionally we also change the timeout, interval, and failureThreshold.

bowei · 2017-10-11T17:50:42Z

From @nicksardo on June 9, 2017 20:56

@bviolier
In case you aren't aware, if your service is using externalTraffic=Global (default), then the other settings aren't very useful. Traffic from the load balancer hits any of the nodes and kube-proxy will route the traffic to a random node. This makes health checking close to useless as you won't have consistent data for a node. readinessProbes/livenessProbes are the way to go for pod health.

magnushiie · 2017-10-24T23:28:06Z

I've been bitten by this issue more than any other issue regarding k8s and GKE. I think it happens due to incorrect abstraction level. Looking at this from 30000 ft, it seems wrong that the ingress controller looks through services into pods to see their probe configuration. What if there are different pods with different probe configurations, or even non-pod backends? Service-level health check should be defined on the service level. Ingress level configuration would work too but would be suboptimal (services can belong to multiple ingresses and it would be bad to duplicate).

Furthermore, the ingress is actually checking whether the service is reachable via a node, not that a pod is alive - it might make sense to have totally separate health checks for these. Also, this would be useful service metadata for any service clients, not just ingress.

magnushiie · 2017-10-24T23:44:47Z

Relevant issue: kubernetes/kubernetes#28924

bbzg · 2017-12-14T17:43:44Z

Today when updating an ingress with a new host/service/deployment a GCP health check was created for / even though the deployment only has one pod and the pod has /healthz as its http readinessProbe.

It took a LONG while to figure out why...

Bug or feature? =)

abourget · 2018-01-12T21:38:01Z

Adding to the docs that we need to deploy the PODS and then rollout an ingress would help.. with a warning. Waste quite a bit of time on this one.

irvifa · 2018-02-28T09:03:33Z

Is this will be solved in kubernetes 1.9? Especially for GLBC 7

axot · 2018-03-03T07:43:31Z

        "containers": [
          {
            "name": "spin-gate",
            "image": "gcr.io/spinnaker-marketplace/gate:0.10.0-20180221133510",
            "ports": [
              {
                "containerPort": 8084,
                "protocol": "TCP"
              }
            ],
            "readinessProbe": {
              "tcpSocket": {
                "port": 8084
              },
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
          }
        ],

It seems when I use tcpSocket in readinessProbe, GCP will create a HTTP health check for /.
If I change tcpSocket to httpGet it works fine, is it a bug?
I'm using 1.9.2-gke.1.

irvifa · 2018-03-08T03:50:40Z

Agree with @axot is it an expected behaviour or a bug?

euan-reid · 2018-03-16T20:55:29Z

2 cents as a user - I can appreciate the desire to not break things on upgrade, but the ingress not updating the health check is a real nuisance.

Would annotations be a solution to this? We use them already for specifying a static IP for the load balancer (for example), so could extend that to other resources. For upgrade compatibility we could assign whatever backend/loadbalancer/healthcheck/etc is currently in use to said annotation. Then, in future, if the annotation isn't set then the ingress knows it has carte blanche to do whatever, but if it is then it knows it's been human-specified. This would also be future-proof if we decide to allow specifying an existing load balancer on creation, to balance between clusters.

fejta-bot · 2018-09-03T04:24:47Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

bgetsug · 2018-09-03T04:50:30Z

/remove-lifecycle stale

justinsb · 2018-10-23T22:09:19Z

I believe the path forwards here is to declare the health-check explicitly on the BackendConfig CRD: #42 (comment)

rramkumar1 · 2018-10-31T22:00:53Z

Agreed that BackendConfig is the way to go here.

I am going to close this bug given that we are most likely not going to support updates to readiness probes. We will happily accept community contributions for #42 :)

/close

k8s-ci-robot · 2018-10-31T22:00:54Z

@rramkumar1: Closing this issue.

In response to this:

Agreed that BackendConfig is the way to go here.

I am going to close this bug given that we are most likely not going to support updates to readiness probes. We will happily accept community contributions for #42 :)

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bowei assigned nicksardo Oct 11, 2017

bowei added backend/gce enhancement labels Oct 11, 2017

bowei mentioned this issue Oct 11, 2017

GCE health check does not pick up changes to pod readinessProbe kubernetes/ingress-nginx#582

Closed

magnushiie mentioned this issue Oct 24, 2017

Service health check kubernetes/kubernetes#28924

Closed

nicksardo removed their assignment Jan 11, 2018

sebglon mentioned this issue Jan 29, 2018

Helm Grafana - ingress fail with anonymous disabled prometheus-operator/prometheus-operator#934

Closed

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed enhancement labels Jun 5, 2018

nicksardo mentioned this issue Jul 11, 2018

Ingress health check not following ReadinessProbe #317

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018

k8s-ci-robot closed this as completed Oct 31, 2018

ryantxu mentioned this issue Jan 17, 2019

Add healthcheck on root url thumbor/thumbor#1154

Merged

wyattjoh mentioned this issue Jan 15, 2020

[CORL-867] Version SSL Bypass coralproject/talk#2797

Merged

MasashiFujiike mentioned this issue Mar 26, 2020

healthcheckエンドポイントを作成した whalepod/milelane#33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCE health check does not pick up changes to pod readinessProbe #39

GCE health check does not pick up changes to pod readinessProbe #39

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

magnushiie commented Oct 24, 2017 •

edited

Loading

magnushiie commented Oct 24, 2017

bbzg commented Dec 14, 2017

abourget commented Jan 12, 2018

irvifa commented Feb 28, 2018

axot commented Mar 3, 2018

irvifa commented Mar 8, 2018

euan-reid commented Mar 16, 2018

fejta-bot commented Sep 3, 2018

bgetsug commented Sep 3, 2018

justinsb commented Oct 23, 2018

rramkumar1 commented Oct 31, 2018

k8s-ci-robot commented Oct 31, 2018

GCE health check does not pick up changes to pod readinessProbe #39

GCE health check does not pick up changes to pod readinessProbe #39

Comments

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

bowei commented Oct 11, 2017

magnushiie commented Oct 24, 2017 • edited Loading

magnushiie commented Oct 24, 2017

bbzg commented Dec 14, 2017

abourget commented Jan 12, 2018

irvifa commented Feb 28, 2018

axot commented Mar 3, 2018

irvifa commented Mar 8, 2018

euan-reid commented Mar 16, 2018

fejta-bot commented Sep 3, 2018

bgetsug commented Sep 3, 2018

justinsb commented Oct 23, 2018

rramkumar1 commented Oct 31, 2018

k8s-ci-robot commented Oct 31, 2018

magnushiie commented Oct 24, 2017 •

edited

Loading