Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cluster-role for cilium to prevent errors in agent startup #11466

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

foobaar
Copy link
Contributor

@foobaar foobaar commented Aug 20, 2024

ciliumloadbalancerippools permissions exists in the cilium helm chart for version 1.13.0 https://github.com/cilium/cilium/blob/v1.13.0/install/kubernetes/cilium/templates/cilium-agent/clusterrole.yaml#L71

The agent also needs permissions to read/watch secrets for bgp auth secrets when using CiliumBGPPeeringPolicy with a secret.
Without the permissions for secrets, we see the following warns in the logs when cilium starts up on a node:

is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kube-system\"" subsys=k8s
time="2024-08-20T14:18:16Z" level=warning msg="github.com/cilium/cilium/pkg/k8s/resource/resource.go:808: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kube-system\"" subsys=klog
time="2024-08-20T14:18:16Z" level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/resource/resource.go:808: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kube-system\"" subsys=k8s

What type of PR is this?
/kind bug

What this PR does / why we need it:
Gives the cilium agent required permissions to watch loadbalancerippools and secrets

Does this PR introduce a user-facing change?:
NONE

Release Note:

Fix Cilium agent permission can't read loadbalancerippools and secrets

** Cilium Config

apiVersion: v1
data:
  agent-health-port: "9879"
  auto-direct-node-routes: "False"
  bgp-secrets-namespace: kube-system
  bpf-ct-global-any-max: "262144"
  bpf-ct-global-tcp-max: "524288"
  bpf-lb-acceleration: disabled
  bpf-lb-mode: dsr
  bpf-map-dynamic-size-ratio: "0.0025"
  cgroup-root: /run/cilium/cgroupv2
  clean-cilium-bpf-state: "false"
  clean-cilium-state: "false"
  cluster-name: default
  cluster-pool-ipv4-cidr: 10.Y.0.0/14
  cluster-pool-ipv4-mask-size: "25"
  cni-exclusive: "True"
  cni-log-file: /var/run/cilium/cilium-cni.log
  custom-cni-conf: "false"
  debug: "False"
  disable-cnp-status-updates: "True"
  enable-bgp-control-plane: "true"
  enable-bpf-clock-probe: "True"
  enable-bpf-masquerade: "False"
  enable-host-legacy-routing: "True"
  enable-hubble: "true"
  enable-ip-masq-agent: "False"
  enable-ipv4: "True"
  enable-ipv4-masquerade: "True"
  enable-ipv6: "False"
  enable-ipv6-masquerade: "True"
  enable-l2-announcements: "False"
  enable-metrics: "true"
  enable-remote-node-identity: "True"
  enable-well-known-identities: "False"
  etcd-config: |-
    ---
    endpoints:
      - https://10.X.0.36:2379
      - https://10.X.0.37:2379
      - https://10.X.0.38:2379
      - https://10.X.0.39:2379
      - https://10.X.0.40:2379

    # In case you want to use TLS in etcd, uncomment the 'ca-file' line
    # and create a kubernetes secret by following the tutorial in
    # https://cilium.link/etcd-config
    ca-file: "/etc/cilium/certs/ca_cert.crt"

    # In case you want client to server authentication, uncomment the following
    # lines and create a kubernetes secret by following the tutorial in
    # https://cilium.link/etcd-config
    key-file: "/etc/cilium/certs/key.pem"
    cert-file: "/etc/cilium/certs/cert.crt"
  hubble-disable-tls: "false"
  hubble-listen-address: :4244
  hubble-metrics: dns drop tcp flow icmp http
  hubble-metrics-server: :9965
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  identity-allocation-mode: kvstore
  ipam: cluster-pool
  ipv4-native-routing-cidr: 10.0.0.0/8
  kube-proxy-replacement: strict
  kvstore: etcd
  kvstore-opt: '{"etcd.config": "/var/lib/etcd-config/etcd.config"}'
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :9963
  preallocate-bpf-maps: "False"
  prometheus-serve-addr: :9962
  routing-mode: native
  sidecar-istio-proxy-image: cilium/istio_proxy
  write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist

ciliumloadbalancerippools permissions exists in the cilium helm chart for version 1.13.0
https://github.com/cilium/cilium/blob/v1.13.0/install/kubernetes/cilium/templates/cilium-agent/clusterrole.yaml#L71

The agent also needs permissions to read/watch secrets for bgp auth secrets when using CiliumBGPPeeringPolicy with a secret.
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 20, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @foobaar. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tico88612
Copy link
Member

@foobaar Please add release note block.

Fix Cilium agent permission can't read loadbalancerippools and secrets

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 21, 2024
@tico88612
Copy link
Member

Please fix release note block.

Like this:

```release-note
```

@tico88612
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 21, 2024
@foobaar
Copy link
Contributor Author

foobaar commented Aug 21, 2024

Updated release note

@tico88612
Copy link
Member

/retest

Copy link
Member

@tico88612 tico88612 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 22, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: foobaar, tico88612
Once this PR has been reviewed and has the lgtm label, please assign cristicalin for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tico88612
Copy link
Member

/retest-failed

@foobaar
Copy link
Contributor Author

foobaar commented Sep 5, 2024

Hey is there anything additional you want from me on this MR?

@tico88612
Copy link
Member

@yankay This PR is about cilium agent fix. Please take a look, thanks.

@@ -28,6 +28,7 @@ rules:
- pods
- endpoints
- nodes
- secrets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the secrets permission in the helm charts is only get for secrets, should this be as tight here ?

Copy link
Contributor Author

@foobaar foobaar Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following errors in Cilium when i dont give watch/list

When I only have get permissions for secrets, I get the following Warn:

time="2024-09-05T18:09:04Z" level=warning msg="github.com/cilium/cilium/pkg/k8s/resource/resource.go:808: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kube-system\"" subsys=klog
time="2024-09-05T18:09:04Z" level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/resource/resource.go:808: Failed to watch *v1.Secret: failed to list *v1.Secret: secrets is forbidden: User \"system:serviceaccount:kube-system:cilium\" cannot list resource \"secrets\" in API group \"\" in the namespace \"kube-system\"" subsys=k8s

When I only have get,list permissions for secrets, I get the following Warn:

time="2024-09-05T18:11:59Z" level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/resource/resource.go:808: Failed to watch *v1.Secret: unknown (get secrets)" subsys=k8s

Copy link
Contributor

@cyclinder cyclinder Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @foobaar, Did you see the errors in cilium-agent pod? Look at https://github.com/cilium/cilium/blob/c9723a8df3cfa336da1f8457a864105d8349acfe/install/kubernetes/cilium/templates/cilium-agent/clusterrole.yaml#L66, I don't see that these extra list/watch permissions are needed cilium upstream, maybe we need to investigate further. which cilium version you used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did/do see these errors in the cilium-agent pod.
The log messages I posted are from the cilium agent pod without list/watch permissions.
The version of cilium I am using is the current default for kubespray which is: 1.15.4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the list/watch secrets from the MR if this is delaying/blocking this MR.
Get secrets and ciliumloadbalancerippools are required by the agent though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to only get secrets and ciliumloadbalancerippools.
These two are missing in cluster role for cilium in kubespray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the list/watch, i will report that to cilium in a separate issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to only get secrets and ciliumloadbalancerippools.
These two are missing in cluster role for cilium in kubespray

Yes, It's better to report that to cilium in a separate issue, we should keep consites with the upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I will so that. Thanks @cyclinder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the MR

@yankay
Copy link
Member

yankay commented Sep 12, 2024

Hi @cyclinder

would you please help to review it

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 13, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 13, 2024
@tico88612
Copy link
Member

/retest-failed

1 similar comment
@tico88612
Copy link
Member

/retest-failed

@foobaar
Copy link
Contributor Author

foobaar commented Sep 18, 2024

Hey! Is there anything else you want from me for this PR?

@tico88612
Copy link
Member

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants