Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE health check does not pick up changes to pod readinessProbe #39

Closed
bowei opened this issue Oct 11, 2017 · 29 comments
Closed

GCE health check does not pick up changes to pod readinessProbe #39

bowei opened this issue Oct 11, 2017 · 29 comments
Labels
backend/gce kind/feature Categorizes issue or PR as related to a new feature.

Comments

@bowei
Copy link
Member

bowei commented Oct 11, 2017

From @ConradIrwin on April 11, 2017 4:22

Importing this issue from kubernetes/kubernetes#43773 as requested.

Kubernetes version (use kubectl version): 1.4.9

Environment:

  • Cloud provider or hardware configuration: GKE
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened:

I created an ingress with type GCE

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: admin-proxy
  labels:
    name: admin-proxy
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
    - secretName: ingress-admin-proxy
  backend:
    serviceName: admin-proxy
    servicePort: 80

This set up the health-check on the google cloud backend endpoint to call "/" as documented.

Unfortunately my service doesn't return 200 on "/" and so I added a readinessCheck to the pod as suggested by the documentation.

What you expected to happen:

I expected the health check to be automatically updated.

Instead I had to delete the ingress and re-create it in order for the health-check to update.

How to reproduce it (as minimally and precisely as possible):

  1. Create a deployment with no readiness probe.
  2. Create an ingress pointing to the pods created by that deployment.
  3. Add a readiness probe to the deployment.

Anything else we need to know:

Copied from original issue: kubernetes/ingress-nginx#582

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @ConradIrwin on April 11, 2017 4:22

Comment by @nicksardo:

There's actually a TODO for deciding what to do in this case https://github.com/kubernetes/ingress/blob/6ed6475e87588494dd29ba5d33bac167ae8acfdd/controllers/gce/healthchecks/healthchecks.go#L71 The controller doesn't want to overwrite handmade changes. We could probably update the settings if they have the default value.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @porridge on April 11, 2017 7:45

But updating from default value would lead to confusing behavior where only
the first change is propagated, and it's not even possible to go back to
default, by changing the ingress, right?

I think it's reasonable to assume that if the ingress controller knows how
to manage a setting, it will overwrite manual changes.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on April 11, 2017 16:23

I agree that it's reasonable and would be an easy solution. However, I worry about unsuspecting users who upgrade their master and find their load balancer throwing 502s. Would creating an event on health check update be sufficient? Thoughts from other users?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @ConradIrwin on April 11, 2017 16:31

Is there any way that the "ingress controller"? can track whether the value was set automatically?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on April 11, 2017 16:56

The controller does not have persistent storage. One option would be to convert the health check settings to JSON and store the state in the HealthCheck's optional description field for tracking diffs, but this seems very ugly.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on April 11, 2017 16:56

cc @ibawt

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @porridge on April 11, 2017 16:57

What do you mean by "creating an event on health check update"?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on April 11, 2017 17:10

If the controller sees a health check with settings different from default/readinessProbe, it updates the check then could create a K8s event for each parent ingress. If readiness probe exists, event could say, "Health Check for port 3XXXX updated to readinessProbe settings". If probe doesn't exist, "Health check for port 3XXXX updated to default settings (see ingress docs for defining settings via readinessProbe)".
There's already an event being created at HC creation, but it doesn't seem very useful to me: https://github.com/kubernetes/ingress/blob/987540f8f61f86d07314cd776fe32e150724b48c/controllers/gce/controller/utils.go#L603

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @ConradIrwin on April 11, 2017 17:21

Just my 2c: As a kubernetes user, I have no idea how to use events — they
seem too transient to be useful (i.e. it's impossible to see when they
actually happened, and I've never seen an event that's helped me to run my
cluster better).

It might just be a UI problem — if there was a way of just tailing the
event logs, then I might comprehend them better, so maybe it's ok to add
one and assume that they'll be fixed upstream

Sent via Superhuman https://sprh.mn/?vip=conrad.irwin@gmail.com

On Tue, Apr 11, 2017 at 10:10 AM, Nick Sardonotifications@gitpro.ttaallkk.topwrote:

If the controller sees a health check with settings different from
default/readinessProbe, it updates the check then could create a K8s event
for each parent ingress. If readiness probe exists, event could say,
"Health Check for port 3XXXX updated to readinessProbe settings". If probe
doesn't exist, "Health check for port 3XXXX updated to default settings
(see ingress docs for defining settings via readinessProbe)".
There's already an event being created at HC creation, but it doesn't seem
very useful to me: https://github.com/kubernetes/ingress/blob/
987540f/controllers/gce/controller/utils.
go#L603


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
kubernetes/ingress-nginx#582 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFwQP-D0IkTqISjn40vKGBNilNOffRjks5ru7QNgaJpZM4M5mnb
.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @kilianc on May 14, 2017 21:36

Is there any workaround to make it pick up the changes, even manually?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @kilianc on May 14, 2017 21:50

To answer my own question and for everybody landing here with the same problem, rotate the name of the service in your rule or backend and the controller will pick up the path correctly. I assume this could be done with 0 downtime creating a new rule and deleting the old one.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @bviolier on June 8, 2017 7:26

We actually encountered an issue yesterday in regards to readinessProbes and health checks on GCE. A while ago we had some difficulties with setting up the health-check correctly through Kubernetes, adding that it doesn't pick up changes, we decided to manually update the health-check on the GCE side, everything worked.
Yesterday we updated our master to 1.6.4, and suddenly all manual set health-check updates were reset to the defaults, probably because Kubernetes (re)set them again.

But, reading this issue, it should never update health-checks. I cannot re-trigger an update now, so it doesn't seem that this functionality has been added. But for sure the master upgrade did trigger this, is that expected behavior?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @bviolier on June 8, 2017 7:34

Another gotcha, when using the example provided, https://github.com/kubernetes/ingress/blob/6ed6475e87588494dd29ba5d33bac167ae8acfdd/controllers/gce/examples/health_checks/health_check_app.yaml, if you remove the ports.containerPort, the GCE ingress health check sets defaults, and can't process the readinessProbe. This might be expected behavior, but at least for us, it wasn't.

All our reset health-checks didn't have a ports.containerPort set.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on June 8, 2017 15:34

@bviolier

Regarding your last comment, containerPort is a required attribute for defining a port of a container. How would you expect to serve network traffic without defining it?

Regarding your loss of manual health check settings, the latest version of the controller creates a newer-type of health check (compute.HealthCheck instead of the legacy compute.HTTPHealthCheck). This chance was made to make HTTP and HTTPS health checks more manageable by the controller and to make future features of the load balancer easier to consume.

What settings are you trying to override with the readiness probe? Just the path?

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @bviolier on June 8, 2017 15:52

@nicksardo Thanks for the information!

I was not aware that the containerPort was a required attribute. But, the pod/services also worked fine without it. The health-check was also working (although after manual editing the path), so traffic could get into the pods through the port specified in the readinessProbe, and without the containerPort.

Makes sense if it created a new type of health-check, thanks for the explanation. I do think more people that manually edited their health checks will run into this issue, but, manually editing health checks is also not the best thing to do ;-)

The most important changes we make with readinessProbe is indeed the path. But occasionally we also change the timeout, interval, and failureThreshold.

@bowei
Copy link
Member Author

bowei commented Oct 11, 2017

From @nicksardo on June 9, 2017 20:56

@bviolier
In case you aren't aware, if your service is using externalTraffic=Global (default), then the other settings aren't very useful. Traffic from the load balancer hits any of the nodes and kube-proxy will route the traffic to a random node. This makes health checking close to useless as you won't have consistent data for a node. readinessProbes/livenessProbes are the way to go for pod health.

@magnushiie
Copy link

magnushiie commented Oct 24, 2017

I've been bitten by this issue more than any other issue regarding k8s and GKE. I think it happens due to incorrect abstraction level. Looking at this from 30000 ft, it seems wrong that the ingress controller looks through services into pods to see their probe configuration. What if there are different pods with different probe configurations, or even non-pod backends? Service-level health check should be defined on the service level. Ingress level configuration would work too but would be suboptimal (services can belong to multiple ingresses and it would be bad to duplicate).

Furthermore, the ingress is actually checking whether the service is reachable via a node, not that a pod is alive - it might make sense to have totally separate health checks for these. Also, this would be useful service metadata for any service clients, not just ingress.

@magnushiie
Copy link

Relevant issue: kubernetes/kubernetes#28924

@bbzg
Copy link

bbzg commented Dec 14, 2017

Today when updating an ingress with a new host/service/deployment a GCP health check was created for / even though the deployment only has one pod and the pod has /healthz as its http readinessProbe.

It took a LONG while to figure out why...

Bug or feature? =)

@nicksardo nicksardo removed their assignment Jan 11, 2018
@abourget
Copy link

Adding to the docs that we need to deploy the PODS and then rollout an ingress would help.. with a warning. Waste quite a bit of time on this one.

@irvifa
Copy link
Member

irvifa commented Feb 28, 2018

Is this will be solved in kubernetes 1.9? Especially for GLBC 7

@axot
Copy link

axot commented Mar 3, 2018

        "containers": [
          {
            "name": "spin-gate",
            "image": "gcr.io/spinnaker-marketplace/gate:0.10.0-20180221133510",
            "ports": [
              {
                "containerPort": 8084,
                "protocol": "TCP"
              }
            ],
            "readinessProbe": {
              "tcpSocket": {
                "port": 8084
              },
              "timeoutSeconds": 1,
              "periodSeconds": 10,
              "successThreshold": 1,
              "failureThreshold": 3
            },
          }
        ],

It seems when I use tcpSocket in readinessProbe, GCP will create a HTTP health check for /.
If I change tcpSocket to httpGet it works fine, is it a bug?
I'm using 1.9.2-gke.1.

@irvifa
Copy link
Member

irvifa commented Mar 8, 2018

Agree with @axot is it an expected behaviour or a bug?

@euan-reid
Copy link

2 cents as a user - I can appreciate the desire to not break things on upgrade, but the ingress not updating the health check is a real nuisance.

Would annotations be a solution to this? We use them already for specifying a static IP for the load balancer (for example), so could extend that to other resources. For upgrade compatibility we could assign whatever backend/loadbalancer/healthcheck/etc is currently in use to said annotation. Then, in future, if the annotation isn't set then the ingress knows it has carte blanche to do whatever, but if it is then it knows it's been human-specified. This would also be future-proof if we decide to allow specifying an existing load balancer on creation, to balance between clusters.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed enhancement labels Jun 5, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018
@bgetsug
Copy link

bgetsug commented Sep 3, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2018
@justinsb
Copy link
Member

I believe the path forwards here is to declare the health-check explicitly on the BackendConfig CRD: #42 (comment)

@rramkumar1
Copy link
Contributor

Agreed that BackendConfig is the way to go here.

I am going to close this bug given that we are most likely not going to support updates to readiness probes. We will happily accept community contributions for #42 :)

/close

@k8s-ci-robot
Copy link
Contributor

@rramkumar1: Closing this issue.

In response to this:

Agreed that BackendConfig is the way to go here.

I am going to close this bug given that we are most likely not going to support updates to readiness probes. We will happily accept community contributions for #42 :)

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend/gce kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests