Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control plane access review requests fail when proxy is unavailable #2407

Closed
klingerf opened this issue Feb 27, 2019 · 2 comments
Closed

Control plane access review requests fail when proxy is unavailable #2407

klingerf opened this issue Feb 27, 2019 · 2 comments
Assignees
Labels

Comments

@klingerf
Copy link
Member

When installing the control plane from latest master, I see that some components have restarted:

NAME                                  READY     STATUS    RESTARTS   AGE
linkerd-ca-fd5cc5c6f-ftgjz            2/2       Running   1          10m
linkerd-controller-5dd7c644fd-x4h45   4/4       Running   3          10m
linkerd-grafana-84bb646d96-vfx66      2/2       Running   0          10m
linkerd-prometheus-57c67549f5-qvfp6   2/2       Running   0          10m
linkerd-web-5d4445f5dd-rqz9w          2/2       Running   0          10m

Looking at the logs for a container that restarted I see this printed immediately prior to exiting:

time="2019-02-27T20:36:42Z" level=fatal msg="Failed to initialize K8s API: Post https://10.96.0.1:443/apis/authorization.k8s.io/v1/selfsubjectaccessreviews: dial tcp 10.96.0.1:443: connect: connection refused"

This appears to be a result of the RBAC checks that were added in #2349. We need to add code that retries those requests until they are successfully returned, rather than exiting, to account for the fact that the proxy won't be immediately available on container startup.

@siggy siggy self-assigned this Feb 27, 2019
@siggy siggy added the priority/P0 Release Blocker label Feb 27, 2019
@olix0r
Copy link
Member

olix0r commented Feb 27, 2019

As @siggy and I discussed, one potential quick-and-dirty (maybe temporary) solution to this is to simply add 443 to the set of outbound skip ports. We don't get any real value from forwarding the TLS stream through the proxy, and it introduces a set of annoying operational issues that are difficult to fully eliminate (until kubernetes offers better ordering primitives).

siggy added a commit that referenced this issue Feb 27, 2019
#2349 introduced a `SelfSubjectAccessReview` check at
startup, to determine whether each control-plane component should
establish Kubernetes watches cluster-wide or namespace-wide. If this
check occurs before the linkerd-proxy sidecar is ready, it fails, and
the control-plane component restarts.

This change configures each control-plane pod to skip outbound port 443
when injecting the proxy, allowing the control-plane to connect to
Kubernetes regardless of the `linkerd-proxy` state.

A longer-term fix should involve a more robust control-plane startup,
that is resilient to failed Kubernetes API requests. An even longer-term
fix could involve injecting `linkerd-proxy` as a Kubernetes "sidecar"
container, when that becomes available.

Workaround for #2407

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
siggy added a commit that referenced this issue Feb 27, 2019
#2349 introduced a `SelfSubjectAccessReview` check at
startup, to determine whether each control-plane component should
establish Kubernetes watches cluster-wide or namespace-wide. If this
check occurs before the linkerd-proxy sidecar is ready, it fails, and
the control-plane component restarts.

This change configures each control-plane pod to skip outbound port 443
when injecting the proxy, allowing the control-plane to connect to
Kubernetes regardless of the `linkerd-proxy` state.

A longer-term fix should involve a more robust control-plane startup,
that is resilient to failed Kubernetes API requests. An even longer-term
fix could involve injecting `linkerd-proxy` as a Kubernetes "sidecar"
container, when that becomes available.

Workaround for #2407

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@klingerf
Copy link
Member Author

This was fixed by #2411

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants