Control plane access review requests fail when proxy is unavailable #2407

klingerf · 2019-02-27T21:01:57Z

When installing the control plane from latest master, I see that some components have restarted:

NAME                                  READY     STATUS    RESTARTS   AGE
linkerd-ca-fd5cc5c6f-ftgjz            2/2       Running   1          10m
linkerd-controller-5dd7c644fd-x4h45   4/4       Running   3          10m
linkerd-grafana-84bb646d96-vfx66      2/2       Running   0          10m
linkerd-prometheus-57c67549f5-qvfp6   2/2       Running   0          10m
linkerd-web-5d4445f5dd-rqz9w          2/2       Running   0          10m

Looking at the logs for a container that restarted I see this printed immediately prior to exiting:

time="2019-02-27T20:36:42Z" level=fatal msg="Failed to initialize K8s API: Post https://10.96.0.1:443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 10.96.0.1:443: connect: connection refused"

This appears to be a result of the RBAC checks that were added in #2349. We need to add code that retries those requests until they are successfully returned, rather than exiting, to account for the fact that the proxy won't be immediately available on container startup.

The text was updated successfully, but these errors were encountered:

olix0r · 2019-02-27T21:15:19Z

As @siggy and I discussed, one potential quick-and-dirty (maybe temporary) solution to this is to simply add 443 to the set of outbound skip ports. We don't get any real value from forwarding the TLS stream through the proxy, and it introduces a set of annoying operational issues that are difficult to fully eliminate (until kubernetes offers better ordering primitives).

#2349 introduced a `SelfSubjectAccessReview` check at startup, to determine whether each control-plane component should establish Kubernetes watches cluster-wide or namespace-wide. If this check occurs before the linkerd-proxy sidecar is ready, it fails, and the control-plane component restarts. This change configures each control-plane pod to skip outbound port 443 when injecting the proxy, allowing the control-plane to connect to Kubernetes regardless of the `linkerd-proxy` state. A longer-term fix should involve a more robust control-plane startup, that is resilient to failed Kubernetes API requests. An even longer-term fix could involve injecting `linkerd-proxy` as a Kubernetes "sidecar" container, when that becomes available. Workaround for #2407 Signed-off-by: Andrew Seigner <siggy@buoyant.io>

klingerf · 2019-02-28T19:52:29Z

This was fixed by #2411

klingerf added the area/controller label Feb 27, 2019

siggy self-assigned this Feb 27, 2019

siggy added the priority/P0 Release Blocker label Feb 27, 2019

klingerf mentioned this issue Feb 27, 2019

Integration tests should validate control plane container logs #2348

Closed

siggy mentioned this issue Feb 27, 2019

Skip outbound port 443 in control-plane #2411

Merged

klingerf closed this as completed Feb 28, 2019

siggy mentioned this issue Apr 22, 2019

Document linkerd-proxy startup / connectivity issues linkerd/website#278

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control plane access review requests fail when proxy is unavailable #2407

Control plane access review requests fail when proxy is unavailable #2407

klingerf commented Feb 27, 2019

olix0r commented Feb 27, 2019

klingerf commented Feb 28, 2019

Control plane access review requests fail when proxy is unavailable #2407

Control plane access review requests fail when proxy is unavailable #2407

Comments

klingerf commented Feb 27, 2019

olix0r commented Feb 27, 2019

klingerf commented Feb 28, 2019