Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port #703

Closed
pdecat opened this issue Mar 26, 2019 · 6 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@pdecat
Copy link

pdecat commented Mar 26, 2019

Under some conditions, the k8s-fw-l7--<uid> firewall rule managed by the ingress-gce controller does not include the target pod's port to the list of allowed ports.
When this happens, health checks do not reach the pods and all requests end up in HTTP 502 errors.

For example, with the following service configuration:

apiVersion: v1
kind: Service
metadata:
  name: test
  namespace: default
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
spec:
  ports:
  - nodePort: 30742
    port: 80
    protocol: TCP
    targetPort: http
  selector:
    app: test
  sessionAffinity: None
  type: NodePort

The pods selected by this service have one container with a corresponding port named http and httpGet readiness/liveness probes referencing that port:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: test
  name: test
  namespace: default
spec:
  containers:
  - image: nginx:latest
    name: nginx
    ports:
    - containerPort: 80
      name: http
      protocol: TCP
    livenessProbe:
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP
    readinessProbe:
      httpGet:
        path: /healthz
        port: http
        scheme: HTTP

And FWIW, the ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.allow-http: "true"
spec:
  backend:
    serviceName: test
    servicePort: 80

image

I've identified two work-arounds for now:

  1. adding a name to the service port:
@@ -11,7 +11,8 @@ metadata:
     cloud.google.com/neg: '{"ingress": true}'
 spec:
   ports:
-  - nodePort: 30742
+  - name: http
+    nodePort: 30742
     port: 80
     protocol: TCP
     targetPort: http
  1. using the port number in the targetPort instead of the port name:
@@ -14,7 +14,7 @@ spec:
   - nodePort: 30742
     port: 80
     protocol: TCP
-    targetPort: http
+    targetPort: 80
   selector:
     app: test
   sessionAffinity: None

When any of those two changes is applied separately, the corresponding port is almost instantly added to the firewall rule:
image

And health checks reach the pods and all requests end up in HTTP 200 status.

Reverting those changes ends up in the original situation: port missing in firewall rule, failed health checks and 502 errors.

Tested on GKE master version 1.11.7-gke.12 with supposedly ingress-gce v1.4.3 according to https://github.com/kubernetes/ingress-gce/blob/master/README.md#gke-version-mapping.
I've yet to check if the issue is still current with ingress-gce v1.5.0 on GKE 1.12.5-gke.10+.

Having access to the GKE managed ingress-gce logs would greatly help troubleshooting these kind of errors. I did not face this issue in our preproduction environment because the same port was already allowed by another service that named its port.

PS: I've learned from reading the GCE ingress controller code that NEGs do not require services to be of type NodePort but I'm still in the process of converting ingresses to container native load balancing by adding the cloud.google.com/neg: '{"ingress": true}'. I'll convert those services back to ClusterIP once done.

I believe this issue should be referenced by #583.

@pdecat pdecat changed the title Firewall rule not updated properly with NEG if service with undefined port name and Firewall rule not updated properly with NEG if service uses name in targetPort or does not name its port Mar 27, 2019
@rramkumar1
Copy link
Contributor

/assign @freehan

@strideynet
Copy link

It appears this also effects creation of Endpoints in Endpoint Groups. I found that it only created them if the targetPort was a number rather than a name.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 11, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants