Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE Ingress creates a Network Endpoint Group with 0 configured #832

Closed
giladsh1 opened this issue Aug 21, 2019 · 5 comments · Fixed by #917
Closed

GCE Ingress creates a Network Endpoint Group with 0 configured #832

giladsh1 opened this issue Aug 21, 2019 · 5 comments · Fixed by #917
Assignees

Comments

@giladsh1
Copy link

GCE Ingress creates a network endpoint group with 0 configured, when setting targetPort name instead of the actual number.
This config will not work and will essentially create a useless network endpoint group, which doesn't recognise the pods.

apiVersion: v1
kind: Service
metadata:
  name: mgmt
  namespace: riscale-test
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
    beta.cloud.google.com/backend-config: '{"ports": {"8080":"mgmt-service-backend"}}'
spec:
  selector:
    app: mgmt
  type: NodePort
  ports:
    - port: 8080
      targetPort: mgmt-port
      protocol: TCP

However, when changing the targetPort to 8080, the network endpoint group will recognise the running pods.
This was a tough bug to catch :-(

@rramkumar1
Copy link
Contributor

@giladsh1 Would be helpful if you can post the specification for your Deployment / Pods.

/assign @freehan

@giladsh1
Copy link
Author

@rramkumar1 adding the deployment config as requested -

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mgmt
  namespace: riscale-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mgmt
  template:
    metadata:
      labels:
        app: mgmt
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                      - default-pool
      restartPolicy: Always
      containers:
        - name: mgmt
          image: eu.gcr.io/riscale/mgmt
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 50m
              memory: 10Mi
            limits:
              cpu: 150m
              memory: 50Mi
          ports:
            - name: mgmt-port
              containerPort: 8080
          readinessProbe:
            httpGet:
              path: /m/health
              port: mgmt-port
            initialDelaySeconds: 2
            periodSeconds: 15
            successThreshold: 2
            failureThreshold: 4
          livenessProbe:
            httpGet:
              path: /m/health
              port: mgmt-port
            periodSeconds: 15
            failureThreshold: 4
          envFrom:
            - configMapRef:
                name: common-config
            - configMapRef:
                name: service-discovery
            - configMapRef:
                name: postgres-config
            - configMapRef:
                name: mongo-config
            - secretRef:
                name: rethinkdb-secrets
            - secretRef:
                name: postgres-secrets
          env:
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /var/secrets/google/mgmt-kms-encrypt.json
          volumeMounts:
            - name: google-cloud-key
              mountPath: /var/secrets/google
              readOnly: true
      volumes:
        - name: google-cloud-key
          secret:
            secretName: mgmt-kms-encrypt-secret

@axot
Copy link

axot commented Sep 3, 2019

We are suffering exactly same issue.

@axot
Copy link

axot commented Sep 3, 2019

After changing targetPort to port number, ingress works now, but we found PODs in a zone never get ready, we deployed our PODs in a,b,c zone, in this case, b zone POD got the issue. READINESS GATES show 0/1 and the endpoint did not be added in service, also deployment rolling update never completed.

$ kgp -o wide
NAME                       READY   STATUS    RESTARTS   AGE     IP             NODE                                                  NOMINATED NODE   READINESS GATES
haproxy-56f8cbc54f-96vqh   1/1     Running   0          4h36m   10.123.131.3   gke-done-production-proxy-1-haproxy-0-4ea1ef33-r08b   <none>           1/1
haproxy-56f8cbc54f-vt2c8   1/1     Running   0          52m     10.123.129.6   gke-done-production-proxy-1-haproxy-0-ec94ed76-hpkn   <none>           1/1
haproxy-67d4497588-8rtdr   1/1     Running   0          12m     10.123.140.3   gke-done-production-proxy-1-haproxy-0-ec94ed76-q2zw   <none>           1/1
haproxy-67d4497588-tg9zt   1/1     Running   0          12m     10.123.141.2   gke-done-production-proxy-1-haproxy-0-97c8c492-949s   <none>           0/1

UPDATES,
We recreated all resources in that namespace, then the issue was lifted.

@freehan
Copy link
Contributor

freehan commented Sep 20, 2019

Okay. I think I uncover the problem. Will add a fix and e2e test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants