Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Kubernetes Tests in the CI #412

Closed
aktech opened this issue Jul 26, 2021 · 3 comments · Fixed by #413
Closed

Failing Kubernetes Tests in the CI #412

aktech opened this issue Jul 26, 2021 · 3 comments · Fixed by #413

Comments

@aktech
Copy link
Contributor

aktech commented Jul 26, 2021

What happened:
The Kubernetes tests are failing in the CI (GitHub Actions)
https://github.com/dask/dask-gateway/runs/3161766008

What you expected to happen:
The Kubernetes tests passing.

I think the first step is to have more logging on why it is failing. Some of this GitHub Actions recipes suggested by @consideRatio might be useful: #408 (comment)

Current logs (from the CI):

Run kill 3210
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + trap exit TERM INT
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 /usr/bin/entry: line 6: can't create /proc/sys/net/ipv4/ip_forward: Read-only file system
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + echo 1
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + true
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + cat /proc/sys/net/ipv4/ip_forward
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + '[' 1 '!=' 1 ]
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + iptables -t nat -I PREROUTING '!' -s 10.43.200.195/32 -p TCP --dport 30200 -j DNAT --to 10.43.200.195:30200
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + iptables -t nat -I POSTROUTING -d 10.43.200.195/32 -p TCP -j MASQUERADE
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + '[' '!' -e /pause ]
svclb-traefik-test-dask-gateway-q47nw lb-port-30200 + mkfifo /pause
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + trap exit TERM INT
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 /usr/bin/entry: line 6: can't create /proc/sys/net/ipv4/ip_forward: Read-only file system
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + echo 1
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + true
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + cat /proc/sys/net/ipv4/ip_forward
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + '[' 1 '!=' 1 ]
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + iptables -t nat -I PREROUTING '!' -s 10.43.200.195/32 -p TCP --dport 30200 -j DNAT --to 10.43.200.195:30200
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + iptables -t nat -I POSTROUTING -d 10.43.200.195/32 -p TCP -j MASQUERADE
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + '[' '!' -e /pause ]
svclb-traefik-test-dask-gateway-xx5xd lb-port-30200 + mkfifo /pause
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Configuration loaded from flags."
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Traefik version 2.1.3 built on 2020-01-21T17:30:29Z"
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/v2.0/contributing/data-collection/\n"
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Starting provider *crd.Provider {\"labelSelector\":\"gateway.dask.org/instance=test-dask-gateway\",\"throttleDuration\":2000000000}"
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="label selector is: \"gateway.dask.org/instance=test-dask-gateway\"" providerName=kubernetescrd
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Creating in-cluster Provider client" providerName=kubernetescrd
traefik-test-dask-gateway-68796c7d9c-2dlb4 traefik time="2021-07-26T13:11:04Z" level=info msg="Starting provider *traefik.Provider {}"
@consideRatio
Copy link
Collaborator

consideRatio commented Jul 26, 2021

To debug something failing more freely, if it is failing as part of a PR where no sensitive secrets can be exposed, I suggest using this github action to expose a SSH session so one can debug it live: https://github.com/yuvipanda/jupyterhub-ssh/blob/a61098162014f5a8c8ad3759273345838873619c/.github/workflows/test-chart.yaml#L154-L160

      # WARNING: Only allow this for pull_request runs that doesn't contain
      #          sensitive information.
      #
      # action reference: https://github.com/mxschmitt/action-tmate@v3
      - name: To enter a SSH debugging session, read these logs
        if: failure() && github.event_name == 'pull_request' && matrix.debuggable == 'debuggable'
        uses: mxschmitt/action-tmate@v3

I see that the current workflow isn't triggering during pull_requests though, but that can be enabled. This is how i trigger tests related to another helm chart: https://github.com/yuvipanda/jupyterhub-ssh/blob/a61098162014f5a8c8ad3759273345838873619c/.github/workflows/test-chart.yaml#L6-L24

@aktech
Copy link
Contributor Author

aktech commented Jul 27, 2021

Thanks @consideRatio

Just debugged this, the issue is with docker image, its pulling an image that doesn't exists in k3d (maybe not imported properly):

Events:
  Type     Reason     Age                    From                              Message
  ----     ------     ----                   ----                              -------
  Normal   Scheduled  <unknown>                                                Successfully assigned default/controller-test-dask-gateway-6db654747-65h2m to k3d-k3s-default-agent-0
  Normal   Pulling    5m14s (x4 over 6m49s)  kubelet, k3d-k3s-default-agent-0  Pulling image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc"
  Warning  Failed     5m14s (x4 over 6m49s)  kubelet, k3d-k3s-default-agent-0  Failed to pull image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": failed to resolve reference "docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc: not found
  Warning  Failed     5m14s (x4 over 6m49s)  kubelet, k3d-k3s-default-agent-0  Error: ErrImagePull
  Warning  Failed     5m3s (x6 over 6m48s)   kubelet, k3d-k3s-default-agent-0  Error: ImagePullBackOff
  Normal   BackOff    107s (x20 over 6m48s)  kubelet, k3d-k3s-default-agent-0  Back-off pulling image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc"
Events:
  Type     Reason     Age                     From                              Message
  ----     ------     ----                    ----                              -------
  Normal   Scheduled  <unknown>                                                 Successfully assigned default/api-test-dask-gateway-85fdb7f5bf-62p8g to k3d-k3s-default-agent-0
  Normal   Pulling    7m23s (x4 over 8m47s)   kubelet, k3d-k3s-default-agent-0  Pulling image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc"
  Warning  Failed     7m23s (x4 over 8m47s)   kubelet, k3d-k3s-default-agent-0  Failed to pull image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": failed to resolve reference "docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc": docker.io/daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc: not found
  Warning  Failed     7m23s (x4 over 8m47s)   kubelet, k3d-k3s-default-agent-0  Error: ErrImagePull
  Warning  Failed     7m9s (x6 over 8m46s)    kubelet, k3d-k3s-default-agent-0  Error: ImagePullBackOff
  Normal   BackOff    3m38s (x21 over 8m46s)  kubelet, k3d-k3s-default-agent-0  Back-off pulling image "daskgateway/dask-gateway-server:1ad3c3ad386ed4c7b61d3ff296b2a67b3371ea093d0159fa9179c889dc2d29cc"

@aktech
Copy link
Contributor Author

aktech commented Jul 28, 2021

I think I understand the problem now, I shall create a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants