Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RuntimeConfig CRI call #8722

Merged
merged 3 commits into from
Aug 4, 2023

Conversation

marquiz
Copy link
Contributor

@marquiz marquiz commented Jun 22, 2023

The rpc only reports one field, i.e. the cgroup driver, to kubelet.
Containerd determines the effective cgroup driver by looking at all
runtime handlers, starting from the default runtime handler (the rest in
alphabetical order), and returning the cgroup driver setting of the
first runtime handler that supports one. If no runtime handler supports
cgroup driver (i.e. has a config option for it) systemd is returned as a
fallback default.

This patch implements the CRI server side of Kubernetes KEP-4033:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri

Refs kubernetes/kubernetes#118770

@k8s-ci-robot
Copy link

Hi @marquiz. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marquiz
Copy link
Contributor Author

marquiz commented Jun 22, 2023

Early peek to an implementation
/cc @mikebrow

@marquiz marquiz force-pushed the devel/cgroup-driver-autoconfig branch from 5715945 to 2163191 Compare June 22, 2023 14:29
@marquiz
Copy link
Contributor Author

marquiz commented Jun 22, 2023

/cc @haircommander

@k8s-ci-robot
Copy link

@marquiz: GitHub didn't allow me to request PR reviews from the following users: haircommander.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @haircommander

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Required to support upcoming Kubernetes (v1.28) features.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
@marquiz marquiz force-pushed the devel/cgroup-driver-autoconfig branch from 2163191 to 276462d Compare July 24, 2023 11:52
@marquiz marquiz marked this pull request as ready for review July 24, 2023 18:26
@marquiz
Copy link
Contributor Author

marquiz commented Jul 24, 2023

Ready for review

@mikebrow
Copy link
Member

/ok-to-test

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! A couple comments / questions, request for a chat. Cheers

pkg/cri/server/runtime_config.go Outdated Show resolved Hide resolved

func (c *criService) getCgroupDriver(ctx context.Context) runtime.CgroupDriver {
// Go through the runtime handlers in a predictable order, starting from the
// default handler, others sorted in alphabetical order
Copy link
Member

@mikebrow mikebrow Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have a conversation about defaulting... see if there isn't a better way than alphabetical, and / or see if there is an issue here:

Noting for the conversation that we probably need to consider adding validation against our config, and talk about other considerations if a runtime is requested that is not the same (different cgroup driver) than as reported via RuntimeConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have a conversation about defaulting... see if there isn't a better way than alphabetical, and / or see if there is an issue here:

Let's have a chat.

Noting for the conversation that we probably need to consider adding validation against our config, and talk about other considerations if a runtime is requested that is not the same (different cgroup driver) than as reported via RuntimeConfig.

Sounds pretty much like converging back to the old faithful "global" systemd_cgroup setting albeit in a very complicated fashion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be the correct path until we can report said global and per/runtime ... and the validation would let admins know..

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to edit the config description...

https://github.com/containerd/containerd/blame/main/docs/cri/config.md#L13-L33

and possibly explain some down in the related runtime option fields for ShimCgroup and SystemdCgroup

log.G(ctx).Debugf("runtime handler %q does not provide cgroup driver information", handler)
}

// If no runtime handlers have a setting, return systemd as the default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to check whether systemd is running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AkihiroSuda, I think this is the right thing to do. And, now that I think of it, maybe we should just drop the SystemdCgroup setting from the runc handler and just rely on auto-detection everywhere. Using cgroupfs when system.slice and folks are around feels like a mis-configuration to me 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SystemdCgroup is unspecified it can just detect the driver automatically.
Explicit configuration of SystemdCgroup should be still supported for compatibility reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using cgroupfs when system.slice and folks are around feels like a mis-configuration to me 🤔

Still legit when the systemd unit is Delegate-d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still legit when the systemd unit is Delegate-d.

Hmm, that's true, in principle. But afaiu this scenario is not supported by kubelet at least. And, I think it's still a mis-configuration if the unit has Delegate=yes but you have SystemdCgroup=true (or vice versa if Delegate=off but you'd have SystemdCgroup=false. So, even in this scenario the right path would seem to be auto-detection (another level of auto-detection, i.e. auto-detection on the pod-sandbox unit level). WDYT @AkihiroSuda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented systemd autodetection in case no runc plugins are configured (i.e. we cannot get the setting from there).

Regarding whether SystemdCgroup is explicitly set in the configuration, I don't think there's any way to detect that. It's a bool that's false by default so 🤷‍♀️

pkg/cri/instrument/instrumented_service.go Show resolved Hide resolved
func (c *criService) RuntimeConfig(ctx context.Context, r *runtime.RuntimeConfigRequest) (*runtime.RuntimeConfigResponse, error) {
resp := &runtime.RuntimeConfigResponse{
Linux: &runtime.LinuxRuntimeConfiguration{
CgroupDriver: c.getCgroupDriver(ctx),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of topic: The kubelet takes the responsibility to create pod cgroup and then the CRI fills the container in it. I think that is reason why there is easy to cause mismatch. IMO, the end goal should be like that kubelet just lets the CRI to maintain the real lifecycle of container and pod. Currently, it's split-brain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree with this ❤️

pkg/cri/sbserver/runtime_config.go Outdated Show resolved Hide resolved
@marquiz
Copy link
Contributor Author

marquiz commented Jul 28, 2023

I updated the PR with auto-detection in case the driver cannot be determined from the runtime-handler options. Regarding ordering (in case the default runtime-handler doesn't provide the setting) I left the alphabetical ordering as there are no other/better suggestions. TBH, I think it should not be allowed to have multiple runc-based runtime classes with conflicting cgroup driver setting (different runtime classes having different SystemdCgroup) and could be rejected in config validation but I think that's subject of another PR.

PTAL @AkihiroSuda @mikebrow @fuweid

@marquiz marquiz force-pushed the devel/cgroup-driver-autoconfig branch from 0f21904 to 9302091 Compare July 28, 2023 10:42
@marquiz
Copy link
Contributor Author

marquiz commented Jul 28, 2023

I made one more change: made the cgroup driver detection code Linux-only, other OS'es resorting to a stub

The rpc only reports one field, i.e. the cgroup driver, to kubelet.
Containerd determines the effective cgroup driver by looking at all
runtime handlers, starting from the default runtime handler (the rest in
alphabetical order), and returning the cgroup driver setting of the
first runtime handler that supports one. If no runtime handler supports
cgroup driver (i.e. has a config option for it) containerd falls back to
auto-detection, returning systemd if systemd is running and cgroupfs
otherwise.

This patch implements the CRI server side of Kubernetes KEP-4033:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
>
> When determining the cgroup driver, containerd uses the `SystemdCgroup`
> setting from runc-based runtime classes, starting from the default runtime
> class. If no runc-based runtime classes have been configured containerd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> class. If no runc-based runtime classes have been configured containerd
> class. Starting with containerd v2.0, if no runc-based runtime classes have been configured containerd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is "starting with containerd v2.0" (plus possibly later with v1.6.x as @mikebrow pointed out below). I could add e.g. a separate paragraph in the end of this note

The automatic cgroup driver configuration for kubelet feature is supported in containerd v2.0 and later.

WDYT @AkihiroSuda @mikebrow ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I don't think we will backport this to v1.6 LTS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @AkihiroSuda for quick response. I now added that separate paragraph at the end of the note. @mikebrow PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I don't think we will backport this to v1.6 LTS

My reasoning to mention backport may be needed before 1.6LTS EOL:

The current alpha stage of this k8s feature has a feature gate that is disabled by default.. and thus the api is not required to be there for supported container runtimes (1.6, 1.7, and possibly 2.0). The current proposed beta plan of this feature shows the feature gate is enabled by default and the api is expected to be enabled by supported container runtimes. Thus requiring clouds take the special action of disabling the feature gate if they don't move up to containerd that supports it. Then at GA stage the feature gate is expected to removed leaving the feature on and container runtimes that don't have it would cause an error, (with that current plan). Plans can always change.

our 1.6 EOL is currently set at: max(February 15, 2025 or next LTS + 6 months) ...

Same LTS issue with many CRI features...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the current plan states that the kubelet fallback behavior (basically kubelet's cgroupDriver setting) would be dropped at GA. However, elsewhere it's stated that kubelet's cgroupDriver setting will only be dropped when support for the feature is adopted by CRI-O and containerd. So if we stick to this plan we wait for containerd LTS (and other non-EOL'd versions) to support this – either backport to v1.6 or then wait for v1.6 EOL. But as @mikebrow stated the plan can (and probably will) change as we get closer to GA and see how this plays out. E.g. we still keep the cgroupDriver fallback for some time even in GA...

Anyway, what it comes to this PR now at hand, I think we cannot mention v1.6 LTS here because it hasn't been backported, yet. We update the documentation if/when we backport this. RIght @mikebrow @AkihiroSuda ?

> When determining the cgroup driver, containerd uses the `SystemdCgroup`
> setting from runc-based runtime classes, starting from the default runtime
> class. If no runc-based runtime classes have been configured containerd
> relies on auto-detection based on determining if systemd is running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> relies on auto-detection based on determining if systemd is running.
> relies on auto-detection based on determining if systemd is running.
> containerd v1.x always defaults to non-systemd cgroupfs driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about this 🧐 What I mean is that this PR only adds new functionality that instructs the kubelet which cgroup driver it should use (for pod sandbox-level cgroups). Containerd v1.x doesn't have this functionality. The PR doesn't change any internal configuration logic or defaults of containerd itself. There is no "containerd-wide" (or even "containerd.cri -wide") cgroup driver setting. The setting only exists in the runc runtime handler plugin options. For other runtime handlers containerd doesn't care (afaiu).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerd v1.x doesn't have this functionality .. "yet". 1.6 is LTS and may need to support the api before EOL

@marquiz marquiz force-pushed the devel/cgroup-driver-autoconfig branch 2 times, most recently from d60befd to 17ddeef Compare August 2, 2023 11:40
Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Backporting to v1.6 is probably fine if it is necessary for Kube && it doesn't break compatibility for existing clusters

@AkihiroSuda AkihiroSuda changed the title cri: implement RuntimeConfig rpc [Breaking change: cgroup driver now defaults to systemd] cri: implement RuntimeConfig rpc [cgroup driver now defaults to systemd] Aug 2, 2023
@marquiz marquiz changed the title cri: implement RuntimeConfig rpc [cgroup driver now defaults to systemd] cri: implement RuntimeConfig rpc Aug 3, 2023
@marquiz
Copy link
Contributor Author

marquiz commented Aug 3, 2023

Thanks for the reviews @mikebrow @AkihiroSuda and @fuweid.

I think the PR title "postfix" ([cgroup driver now defaults to systemd]) was wrong/misleading so I removed it. This PR is not changing the default cgroup driver (which setting only exists in the runc runtime handler plugin). With the new RuntimeConfig RPC we need to tell something to the kubelet and in case no runc-based runtime classes are configured (where we could get the cgroup driver setting) we resort to detecting if systemd is running or not (this doesn't change any containerd behavior, we just tell this information to the kubelet).

@mikebrow
Copy link
Member

mikebrow commented Aug 3, 2023

@marquiz missing dco signature on the docs commit..

Document the user visible changes related to the new RuntimeConfig RPC.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
@marquiz marquiz force-pushed the devel/cgroup-driver-autoconfig branch from 17ddeef to c62abcb Compare August 4, 2023 06:48
@marquiz
Copy link
Contributor Author

marquiz commented Aug 4, 2023

@marquiz missing dco signature on the docs commit..

Argh, thanks for pointing this out @mikebrow. Fixed now

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuweid fuweid merged commit 2b2195c into containerd:main Aug 4, 2023
43 checks passed
@marquiz marquiz deleted the devel/cgroup-driver-autoconfig branch August 4, 2023 09:14
@dmcgowan dmcgowan added the area/cri Container Runtime Interface (CRI) label Nov 1, 2023
@dmcgowan dmcgowan changed the title cri: implement RuntimeConfig rpc Implement RuntimeConfig CRI call Nov 1, 2023
@dmcgowan
Copy link
Member

dmcgowan commented Nov 6, 2023

I'm going to remove the breaking label for now since there isn't enough explanation in the title to indicate what is breaking. It just appears as an addition. We can add a note to the 2.0 release notes explaining the cgroup change if there is user action required in some cases.

> When determining the cgroup driver, containerd uses the `SystemdCgroup`
> setting from runc-based runtime classes, starting from the default runtime
> class. If no runc-based runtime classes have been configured containerd
> relies on auto-detection based on determining if systemd is running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmcgowan

I added the impact/breaking label because this seems to break an existing node that is (mis)using non-systemd cgroup driver on systemd distro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a note about it, it just looked confusing in the change log. It implies that the RuntimeConfig CRI call itself is a breaking change. #9340

We can clean up the language in future betas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe I have misread the paragraph

#9340 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) impact/changelog ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants