-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RuntimeConfig CRI call #8722
Implement RuntimeConfig CRI call #8722
Conversation
Hi @marquiz. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Early peek to an implementation |
5715945
to
2163191
Compare
/cc @haircommander |
@marquiz: GitHub didn't allow me to request PR reviews from the following users: haircommander. Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Required to support upcoming Kubernetes (v1.28) features. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2163191
to
276462d
Compare
Ready for review |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! A couple comments / questions, request for a chat. Cheers
pkg/cri/server/runtime_config.go
Outdated
|
||
func (c *criService) getCgroupDriver(ctx context.Context) runtime.CgroupDriver { | ||
// Go through the runtime handlers in a predictable order, starting from the | ||
// default handler, others sorted in alphabetical order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's have a conversation about defaulting... see if there isn't a better way than alphabetical, and / or see if there is an issue here:
Noting for the conversation that we probably need to consider adding validation against our config, and talk about other considerations if a runtime is requested that is not the same (different cgroup driver) than as reported via RuntimeConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's have a conversation about defaulting... see if there isn't a better way than alphabetical, and / or see if there is an issue here:
Let's have a chat.
Noting for the conversation that we probably need to consider adding validation against our config, and talk about other considerations if a runtime is requested that is not the same (different cgroup driver) than as reported via RuntimeConfig.
Sounds pretty much like converging back to the old faithful "global" systemd_cgroup
setting albeit in a very complicated fashion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that might be the correct path until we can report said global and per/runtime ... and the validation would let admins know..
e6b7fbc
to
3fa3834
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to edit the config description...
https://github.com/containerd/containerd/blame/main/docs/cri/config.md#L13-L33
and possibly explain some down in the related runtime option fields for ShimCgroup and SystemdCgroup
pkg/cri/sbserver/runtime_config.go
Outdated
log.G(ctx).Debugf("runtime handler %q does not provide cgroup driver information", handler) | ||
} | ||
|
||
// If no runtime handlers have a setting, return systemd as the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to check whether systemd is running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AkihiroSuda, I think this is the right thing to do. And, now that I think of it, maybe we should just drop the SystemdCgroup
setting from the runc handler and just rely on auto-detection everywhere. Using cgroupfs
when system.slice
and folks are around feels like a mis-configuration to me 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When SystemdCgroup
is unspecified it can just detect the driver automatically.
Explicit configuration of SystemdCgroup
should be still supported for compatibility reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cgroupfs when system.slice and folks are around feels like a mis-configuration to me 🤔
Still legit when the systemd unit is Delegate
-d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still legit when the systemd unit is
Delegate
-d.
Hmm, that's true, in principle. But afaiu this scenario is not supported by kubelet at least. And, I think it's still a mis-configuration if the unit has Delegate=yes
but you have SystemdCgroup=true
(or vice versa if Delegate=off
but you'd have SystemdCgroup=false
. So, even in this scenario the right path would seem to be auto-detection (another level of auto-detection, i.e. auto-detection on the pod-sandbox unit level). WDYT @AkihiroSuda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented systemd autodetection in case no runc plugins are configured (i.e. we cannot get the setting from there).
Regarding whether SystemdCgroup
is explicitly set in the configuration, I don't think there's any way to detect that. It's a bool that's false by default so 🤷♀️
pkg/cri/sbserver/runtime_config.go
Outdated
func (c *criService) RuntimeConfig(ctx context.Context, r *runtime.RuntimeConfigRequest) (*runtime.RuntimeConfigResponse, error) { | ||
resp := &runtime.RuntimeConfigResponse{ | ||
Linux: &runtime.LinuxRuntimeConfiguration{ | ||
CgroupDriver: c.getCgroupDriver(ctx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of topic: The kubelet takes the responsibility to create pod cgroup and then the CRI fills the container in it. I think that is reason why there is easy to cause mismatch. IMO, the end goal should be like that kubelet just lets the CRI to maintain the real lifecycle of container and pod. Currently, it's split-brain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree with this ❤️
3fa3834
to
0f21904
Compare
I updated the PR with auto-detection in case the driver cannot be determined from the runtime-handler options. Regarding ordering (in case the default runtime-handler doesn't provide the setting) I left the alphabetical ordering as there are no other/better suggestions. TBH, I think it should not be allowed to have multiple runc-based runtime classes with conflicting cgroup driver setting (different runtime classes having different PTAL @AkihiroSuda @mikebrow @fuweid |
0f21904
to
9302091
Compare
I made one more change: made the cgroup driver detection code Linux-only, other OS'es resorting to a stub |
The rpc only reports one field, i.e. the cgroup driver, to kubelet. Containerd determines the effective cgroup driver by looking at all runtime handlers, starting from the default runtime handler (the rest in alphabetical order), and returning the cgroup driver setting of the first runtime handler that supports one. If no runtime handler supports cgroup driver (i.e. has a config option for it) containerd falls back to auto-detection, returning systemd if systemd is running and cgroupfs otherwise. This patch implements the CRI server side of Kubernetes KEP-4033: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
> | ||
> When determining the cgroup driver, containerd uses the `SystemdCgroup` | ||
> setting from runc-based runtime classes, starting from the default runtime | ||
> class. If no runc-based runtime classes have been configured containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> class. If no runc-based runtime classes have been configured containerd | |
> class. Starting with containerd v2.0, if no runc-based runtime classes have been configured containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole thing is "starting with containerd v2.0" (plus possibly later with v1.6.x as @mikebrow pointed out below). I could add e.g. a separate paragraph in the end of this note
The automatic cgroup driver configuration for kubelet feature is supported in containerd v2.0 and later.
WDYT @AkihiroSuda @mikebrow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I don't think we will backport this to v1.6 LTS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @AkihiroSuda for quick response. I now added that separate paragraph at the end of the note. @mikebrow PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I don't think we will backport this to v1.6 LTS
My reasoning to mention backport may be needed before 1.6LTS EOL:
The current alpha stage of this k8s feature has a feature gate that is disabled by default.. and thus the api is not required to be there for supported container runtimes (1.6, 1.7, and possibly 2.0). The current proposed beta plan of this feature shows the feature gate is enabled by default and the api is expected to be enabled by supported container runtimes. Thus requiring clouds take the special action of disabling the feature gate if they don't move up to containerd that supports it. Then at GA stage the feature gate is expected to removed leaving the feature on and container runtimes that don't have it would cause an error, (with that current plan). Plans can always change.
our 1.6 EOL is currently set at: max(February 15, 2025 or next LTS + 6 months) ...
Same LTS issue with many CRI features...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the current plan states that the kubelet fallback behavior (basically kubelet's cgroupDriver
setting) would be dropped at GA. However, elsewhere it's stated that kubelet's cgroupDriver
setting will only be dropped when support for the feature is adopted by CRI-O and containerd. So if we stick to this plan we wait for containerd LTS (and other non-EOL'd versions) to support this – either backport to v1.6 or then wait for v1.6 EOL. But as @mikebrow stated the plan can (and probably will) change as we get closer to GA and see how this plays out. E.g. we still keep the cgroupDriver
fallback for some time even in GA...
Anyway, what it comes to this PR now at hand, I think we cannot mention v1.6 LTS here because it hasn't been backported, yet. We update the documentation if/when we backport this. RIght @mikebrow @AkihiroSuda ?
> When determining the cgroup driver, containerd uses the `SystemdCgroup` | ||
> setting from runc-based runtime classes, starting from the default runtime | ||
> class. If no runc-based runtime classes have been configured containerd | ||
> relies on auto-detection based on determining if systemd is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> relies on auto-detection based on determining if systemd is running. | |
> relies on auto-detection based on determining if systemd is running. | |
> containerd v1.x always defaults to non-systemd cgroupfs driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure about this 🧐 What I mean is that this PR only adds new functionality that instructs the kubelet which cgroup driver it should use (for pod sandbox-level cgroups). Containerd v1.x doesn't have this functionality. The PR doesn't change any internal configuration logic or defaults of containerd itself. There is no "containerd-wide" (or even "containerd.cri -wide") cgroup driver setting. The setting only exists in the runc runtime handler plugin options. For other runtime handlers containerd doesn't care (afaiu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerd v1.x doesn't have this functionality .. "yet". 1.6 is LTS and may need to support the api before EOL
d60befd
to
17ddeef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Backporting to v1.6 is probably fine if it is necessary for Kube && it doesn't break compatibility for existing clusters
Thanks for the reviews @mikebrow @AkihiroSuda and @fuweid. I think the PR title "postfix" ( |
@marquiz missing dco signature on the docs commit.. |
Document the user visible changes related to the new RuntimeConfig RPC. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
17ddeef
to
c62abcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm going to remove the breaking label for now since there isn't enough explanation in the title to indicate what is breaking. It just appears as an addition. We can add a note to the 2.0 release notes explaining the cgroup change if there is user action required in some cases. |
> When determining the cgroup driver, containerd uses the `SystemdCgroup` | ||
> setting from runc-based runtime classes, starting from the default runtime | ||
> class. If no runc-based runtime classes have been configured containerd | ||
> relies on auto-detection based on determining if systemd is running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the impact/breaking
label because this seems to break an existing node that is (mis)using non-systemd cgroup driver on systemd distro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a note about it, it just looked confusing in the change log. It implies that the RuntimeConfig CRI call itself is a breaking change. #9340
We can clean up the language in future betas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe I have misread the paragraph
The rpc only reports one field, i.e. the cgroup driver, to kubelet.
Containerd determines the effective cgroup driver by looking at all
runtime handlers, starting from the default runtime handler (the rest in
alphabetical order), and returning the cgroup driver setting of the
first runtime handler that supports one. If no runtime handler supports
cgroup driver (i.e. has a config option for it) systemd is returned as a
fallback default.
This patch implements the CRI server side of Kubernetes KEP-4033:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4033-group-driver-detection-over-cri
Refs kubernetes/kubernetes#118770