Design proposal: Support Dynamic Primary Pod NIC Name #324

orelmisan · 2024-09-15T16:32:30Z

What this PR does / why we need it:
This proposal proposes a mechanism to dynamically determine the primary pod interface for KuveVirt virtual machines (
VMs).
The goal is to move away from the hard coded eth0 interface and allow flexible, environment-driven interface
selection.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Design: A design document was considered and is present (link) or not required
PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Testing: New code requires new unit tests. New features and bug fixes require at least on e2e test
Documentation: A user-guide update was considered and is present (link) or not required. You want a user-guide update if it's a user facing feature / API change.
Community: Announcement to kubevirt-dev was considered

Release note:

None

kubevirt-bot · 2024-09-15T16:32:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

kubevirt-bot · 2024-09-15T16:32:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign alonakaplan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

orelmisan

Moved unanswered comments from another document here.

orelmisan · 2024-09-15T16:34:32Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+## User Stories
+
+- As a cluster administrator I want to use a CNI that configures pods’ primary interface to a name other than `eth0`.


@EdDev commented: By its own, this story has nothing to do with Kubevirt.
Please try to tell a full story in one paragraph if all details are related.
Please provide stories for core bindings and plugin bindings, to make it clear we are interesting to cover them all.

Also, please elaborate about the correctness of the interface status reporting. E.g. that the information is collected from the correct pod interface.

I would also mention a story that assures existing setups with "eth0" continue working as expected.

Also, please provide a story for a migration. At the source it uses "eth0" and at the target it is using something else. An upgrade path may introduce this scenario, so lets see if we have a problem with it or not.

orelmisan · 2024-09-15T16:35:12Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+- https://github.com/kubevirt/kubevirt
+
+# Design


@ormergi commented: From what I understand, we need to pass yet another piece of information from pod annotation down to virt-handler and virt-launcher.
Given that virt-controller have access to the pod and to the subject information.
And there is a mechanism that enable passing info from virt-controller to virt-handler, that is kubevirt.io/network-info annotation.
Can we use this mechanism to pass the primary network interface name as well?
Is so, I think it will save from extending the status API.

@ormergi I've added your suggestion as an alternative, it will be visible in the next push.

Since the kubevirt.io/network-info annotation is set on the virt-launcher pod (and not on the VMI object), virt-handler doesn't have access to it.

Since this information is needed in virt-handler as well (for network setup and network status report), a new mechanism should be created in order to share this information.

Another con to this suggestion, is the need to have the downward API for every virt-launcher pod, in comparison to just the ones derived from VMIs with SR-IOV NICs or network binding plugins that require it.

orelmisan · 2024-09-15T16:35:58Z

design-proposals/dynamic-primary-pod-nic-name.md

+]
+```
+
+This annotation could be parsed to infer the default (primary) network interface instead of relying on a hard coded


@ormergi commented: Does the check relay on default=true?
Is so, I think it should be noted.

Yes, the primary interface will be the first one which has default == true.
The Kubernetes Network Custom Resource Definition De-facto Standard states that there could be just one, but recently a bug was discovered that multiple interfaces could be marked as having default == true.

I've added it into the document, it will be visible in the next push.

The bug: k8snetworkplumbingwg/network-attachment-definition-client#69

This was fixed already, and is available in multus 4.1.1.

Thank you for fixing it @maiqueb :)

orelmisan · 2024-09-15T16:36:43Z

design-proposals/dynamic-primary-pod-nic-name.md

+This annotation could be parsed to infer the default (primary) network interface instead of relying on a hard coded
+interface name (like eth0).
+
+If the network-status annotation is missing or doesn’t specify a default interface, KubeVirt will fall back to using


@ormergi commented: Would kubevirt network-setup require change? (e.g.: be executed again in case annotation is missing)

AFAIU, the network setup is agnostic to the interface it configures - meaning it doesn't care if it the interface being configured is bound to the primary pod interface (connected to the cluster network) or to a secondary interface (connected to some other network).

The logic that would probably have to change in the virt-handler is the one calculating pod interfaces' names:
https://github.com/kubevirt/kubevirt/blob/d50cc6c7cc02c5f5c33ffb77fc6710828370376c/pkg/network/setup/netpod/netpod.go#L240

If we will go with the alternative that adds each pod interface's name to the VMI.status.interfaces.[x], we will not have to calculate it, and just take the information directly from the VMI object.

This will enable us to change the naming scheme more easily in the future in case we will need to.

orelmisan · 2024-09-15T16:39:26Z

design-proposals/dynamic-primary-pod-nic-name.md

+- No configuration is required in KubeVirt
+- Provides backward compatibility
+
+### Cons


@EdDev commented:

The added information is probably not useful for a user.

The added information is not useful for anything beyond this primary network and it opens a big question: What is the primary network and when it is even relevant.

orelmisan · 2024-09-15T16:40:26Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+- https://github.com/kubevirt/kubevirt
+
+# Design


@ormergi commented: Can we have Kubevirt realize which interface is primary (e.g.: by default route) as an alternative?
Would that work?

I don't think that it should be KubeVirt's responsibility.
Please note that a secondary network could also have a default route.

orelmisan · 2024-09-15T16:58:53Z

/uncc @aburdenthehand @cwilkers
/cc @EdDev

design-proposals/dynamic-primary-pod-nic-name.md

maiqueb

Would it make sense to configure a cluster-wide preferred interface to latch on to in the KubeVirt CR ?

I.e. the admin would indicate what if the preferred pod interface name to attach to; when not present, default to eth0, as done today.

It's a bit tied to the use-case, but has the advantages of not requiring new API (... in KubeVirt), and doesn't expose pointless info in the status.

Acedus · 2024-09-16T08:10:33Z

The proposal assumes that KubeVirt is used with a multi-homing CNI plugin, necessitating a clear distinction between interfaces and their intended roles. For instance, if a VM uses a single interface from a CNI where the interface name is different from the standard eth0, it should be straightforward to identify this interface as primary by selecting the only available one. Consequently, the proposal is relevant only if the Pod uses n >= 2 non-standard interfaces (where "non-standard" refers to interfaces not defined by the user or CNI).

A few questions arise:

How is the primary interface defined? Is it the interface used to connect to the default cluster network as specified by Multus? What if a VM only uses bridge-CNI and does not connect to the cluster network (e.g., using two bridge interfaces)? Determining the primary interface of a Pod seems challenging without a clear definition, especially with complex scenarios like OVN-Kubernetes' UDNs. It seems overly context-specific to determine this dynamically with sufficient accuracy.
Regarding the n >= 2 remark, if we rely on Multus for inference, other multi-homing methods will need to adhere to the fallback logic implemented (e.g., defaulting to eth0). Are there plans to address this?
Instead of relying on interface names to manage network plumbing, could we use other, more easily inferred parameters? Would it be more advantageous to let users specify the primary interface directly, as deviations from the standard eth0 are exceptions rather than the norm?

qinqon

Some nits, while the doc is finished.

qinqon · 2024-09-16T10:18:32Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+## Alternative 1 - Report Pod Interface Name Per Interface
+
+Currently, the `VirtualMachineInstanceStatus` struct holds a slice of interface status objects:


Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

Also do we really need to configure this PR vm or per network ? I doubt a per VM make much sense.

Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

AFAIK, ATM the VirtualMachineInstance CRD does not have a /status sub-resource [1].
Could you please elaborate on the scenarios on which updating the VMI status could be ignored?

Also do we really need to configure this PR vm or per network ? I doubt a per VM make much sense.

Currently both per-interface and once per VMI options are described.
Both has their pros and cons.

Could you please elaborate why do you think having the information per VM does not make sense?

[1] kubevirt/kubevirt#3889

Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state.
If it is a subresource or not is an implementation detail that is suppose to improve safety.

AFAIK, ATM the VirtualMachineInstance CRD does not have a /status sub-resource [1].

The VMI has a status subresource.

Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state. If it is a subresource or not is an implementation detail that is suppose to improve safety.

If you have a CRD with a status the put a condition about "success" and a Reconcile cycle that update that, you don't want to enter that reconcile if you update it since it will be a loop, so depending on the implementation that is skip, but maybe not the case for kubevirt.

Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state. If it is a subresource or not is an implementation detail that is suppose to improve safety.

AFAIK, ATM the VirtualMachineInstance CRD does not have a /status sub-resource [1].

The VMI has a status subresource.

The VMI CRD does not have a status sub-resource.
Please see:

https://github.com/kubevirt/kubevirt/blob/dc6b76f494ffdb95305fed5ce038963539d469a1/pkg/virt-operator/resource/generate/components/crds.go#L167

in comparison to the VirtualMachine CRD that does have it:
https://github.com/kubevirt/kubevirt/blob/dc6b76f494ffdb95305fed5ce038963539d469a1/pkg/virt-operator/resource/generate/components/crds.go#L205

Currently both per-interface and once per VMI options are described. Both has their pros and cons.

Could you please elaborate why do you think having the information per VM does not make sense?

[1] kubevirt/kubevirt#3889

As far as I have see primary interface is common between all the VMs on the same network, so it do not make sense to configure it per VM.

The VMI CRD does not have a status sub-resource.

This is surprising.

As far as I have see primary interface is common between all the VMs on the same network, so it do not make sense to configure it per VM.

I am not familiar with an entity that is describing a VM collection per namespace.
The Kubevirt CR is at the cluster level and creating a new resource kind for this does not make sense.

But I also do not understand how is this even relevant for the line this comment thread refers to.
Are you suggesting another alternative? Can you suggest to add one at a proper location with all the details?

design-proposals/dynamic-primary-pod-nic-name.md

oshoval

Thanks

oshoval · 2024-09-16T11:42:36Z

design-proposals/dynamic-primary-pod-nic-name.md

+## Using Multus’ network-status annotation
+
+When KubeVirt is used in a cluster that has Multus as its default CNI, Multus adds the
+`k8s.v1.cni.cncf.io/network-status` annotation


does it worth to consider looking on k8s.ovn.org/pod-networks ?
or k8s.v1.cni.cncf.io/network-status is preferable, because it is the CNI answer?

Since KubeVirt could be used with an arbitrary CNI, I don't think it should look for CNI-specific annotations.
Having said that, Multus is an exception since KubeVirt is currently coupled to it.

design-proposals/dynamic-primary-pod-nic-name.md

oshoval · 2024-09-16T12:16:28Z

design-proposals/dynamic-primary-pod-nic-name.md

+}
+```
+
+VMI controller will set this field's value from Multus' network-status annotation, or fallback to `eth0` in case Multus


worth to remove the common part from the alternatives please, as the alternatives differ about what to report,
but the how is common

EdDev · 2024-09-17T04:21:21Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+## Alternative 1 - Report Pod Interface Name Per Interface
+
+Currently, the `VirtualMachineInstanceStatus` struct holds a slice of interface status objects:


Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.

I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state.
If it is a subresource or not is an implementation detail that is suppose to improve safety.

AFAIK, ATM the VirtualMachineInstance CRD does not have a /status sub-resource [1].

The VMI has a status subresource.

EdDev · 2024-09-17T05:27:22Z

design-proposals/dynamic-primary-pod-nic-name.md

+However, users might require custom network setups where the pod’s primary interface is not named `eth0`.
+Currently, KubeVirt attaches the VM’s interface destined to the pod network to the pod’s `eth0` interface.


These two sentences probably should be swapped. First describe how it is now and then what is missing.

Good catch.
Done.

EdDev · 2024-09-17T05:32:04Z

design-proposals/dynamic-primary-pod-nic-name.md

+When creating a VM with secondary networks, KubeVirt specifies the pod interfaces’ names for all secondary interfaces.
+It is currently not possible to specify the primary interface’s name when templating the virt-launcher pod, since there
+is no K8s API that enables it.


This looks like a solution to me. Defining in advance the name of the interface is what we do for secondary and it could have been an option to solve it in the same way, but we cannot. Perhaps you can add this as a note or as part of an alternative that is not choosen due to lack of an ability to implement this.
(I must admit, it sounds strange that this cannot be done using an env variable on the node, but this is not really something we need to explore further at this stage)

EdDev · 2024-09-17T05:36:17Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+- Allow the primary pod interface name to be dynamically determined based on the environment.
+- Ensure backward compatibility, so existing VMs with `eth0` as the primary interface do not break.
+- Handle the scenario of a non-eth0 primary interface with a binding plugin.


I do not understand this goal. If we can consume a non standard name, why is this needed?

EdDev · 2024-09-17T05:44:23Z

design-proposals/dynamic-primary-pod-nic-name.md

+- Change Kubernetes' API.
+- Change Multus' API.


These are not goals, these are implementations.

A non-goal will be for example: Support dynamic consumption of a primary pod interface name on deployments without Multus.
But it may be better not to mention anything.

EdDev · 2024-09-17T07:41:39Z

design-proposals/dynamic-primary-pod-nic-name.md

+  ...
+  interfaces:
+      - infoSource: 'domain, guest-agent, multus-status'
+        podInterfaceName: eth0


I suggest you use in the examples a different name to emphasize the need. Same for the other places.
Perhaps sync the name with the network-status dump used before.

EdDev · 2024-09-17T07:44:10Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+### Pros
+
+- Less API than the first alternative


I would either drop this or move it to the bottom. I see no special pro here... on the contrary.

EdDev · 2024-09-17T07:46:39Z

design-proposals/dynamic-primary-pod-nic-name.md

+## Alternative 3 - Use annotations instead of K8s API
+
+<!-- TODO: Complete this section -->


Please do not add it unless you have content.

EdDev · 2024-09-17T07:49:26Z

design-proposals/dynamic-primary-pod-nic-name.md

+
+## Network Binding Plugins
+
+A Virual Machine's NIC could be bound to the pod's primary NIC using


I do not understand this section.
It already does, so repeating this here adds no special value.

If you want to express that it should consume a non standard name, then please elaborate and explain if you need or do not need to change something.

EdDev · 2024-09-17T07:56:33Z

design-proposals/dynamic-primary-pod-nic-name.md

+A Virual Machine's NIC could be bound to the pod's primary NIC using
+a [network binding plugin](https://kubevirt.io/user-guide/network/network_binding_plugins/).
+
+## CNI


Looks like this belongs under the Network Binding Plugins section, not in the same level.

But in continuation to the prev comment, I do not understand why there is a need to give such a long intro here about the CNI of a binding plugin is. You can just leave the reference to the user-guide or the design-proposal.

In general, we are not providing as input to the binding plugin any information today about it.
I see no option to do it, so that should be the statement here.

Adding a recommendation on what a binding plugin author can do to mitigate this is possible, but we do not expect anything more than that.

orelmisan · 2024-09-17T12:05:24Z

Would it make sense to configure a cluster-wide preferred interface to latch on to in the KubeVirt CR ?

I.e. the admin would indicate what if the preferred pod interface name to attach to; when not present, default to eth0, as done today.

It's a bit tied to the use-case, but has the advantages of not requiring new API (... in KubeVirt), and doesn't expose pointless info in the status.

@maiqueb I've added this alternative here.

Signed-off-by: Orel Misan <omisan@redhat.com>

kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Sep 15, 2024

kubevirt-bot requested review from aburdenthehand and cwilkers September 15, 2024 16:32

kubevirt-bot added the size/L label Sep 15, 2024

orelmisan commented Sep 15, 2024

View reviewed changes

kubevirt-bot requested review from EdDev and removed request for aburdenthehand and cwilkers September 15, 2024 16:58

maiqueb reviewed Sep 16, 2024

View reviewed changes

design-proposals/dynamic-primary-pod-nic-name.md Outdated Show resolved Hide resolved

maiqueb reviewed Sep 16, 2024

View reviewed changes

qinqon suggested changes Sep 16, 2024

View reviewed changes

kubevirt-bot assigned qinqon Sep 16, 2024

oshoval reviewed Sep 16, 2024

View reviewed changes

EdDev reviewed Sep 17, 2024

View reviewed changes

orelmisan force-pushed the dynamic-pod-primary-nic-name branch from 801ff32 to 13b601f Compare September 17, 2024 12:03

Design proposal: Support Dynamic Primary Pod NIC Name

bdd345f

Signed-off-by: Orel Misan <omisan@redhat.com>

orelmisan force-pushed the dynamic-pod-primary-nic-name branch from 13b601f to bdd345f Compare September 18, 2024 21:12


		## User Stories

		- As a cluster administrator I want to use a CNI that configures pods’ primary interface to a name other than `eth0`.


		## Alternative 1 - Report Pod Interface Name Per Interface

		Currently, the `VirtualMachineInstanceStatus` struct holds a slice of interface status objects:

		However, users might require custom network setups where the pod’s primary interface is not named `eth0`.
		Currently, KubeVirt attaches the VM’s interface destined to the pod network to the pod’s `eth0` interface.

		## Alternative 3 - Use annotations instead of K8s API

		<!-- TODO: Complete this section -->


		## Network Binding Plugins

		A Virual Machine's NIC could be bound to the pod's primary NIC using

Design proposal: Support Dynamic Primary Pod NIC Name #324

Are you sure you want to change the base?

Design proposal: Support Dynamic Primary Pod NIC Name #324

Conversation

orelmisan commented Sep 15, 2024

kubevirt-bot commented Sep 15, 2024

kubevirt-bot commented Sep 15, 2024

orelmisan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orelmisan commented Sep 15, 2024

maiqueb left a comment • edited Loading

Choose a reason for hiding this comment

Acedus commented Sep 16, 2024 • edited Loading

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinqon Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oshoval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EdDev Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

orelmisan commented Sep 17, 2024

maiqueb left a comment •

edited

Loading

Acedus commented Sep 16, 2024 •

edited

Loading

qinqon Sep 17, 2024 •

edited

Loading

EdDev Sep 17, 2024 •

edited

Loading