-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design proposal: Support Dynamic Primary Pod NIC Name #324
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved unanswered comments from another document here.
|
||
## User Stories | ||
|
||
- As a cluster administrator I want to use a CNI that configures pods’ primary interface to a name other than `eth0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdDev commented: By its own, this story has nothing to do with Kubevirt.
Please try to tell a full story in one paragraph if all details are related.
Please provide stories for core bindings and plugin bindings, to make it clear we are interesting to cover them all.
Also, please elaborate about the correctness of the interface status reporting. E.g. that the information is collected from the correct pod interface.
I would also mention a story that assures existing setups with "eth0" continue working as expected.
Also, please provide a story for a migration. At the source it uses "eth0" and at the target it is using something else. An upgrade path may introduce this scenario, so lets see if we have a problem with it or not.
|
||
- https://github.com/kubevirt/kubevirt | ||
|
||
# Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi commented: From what I understand, we need to pass yet another piece of information from pod annotation down to virt-handler and virt-launcher.
Given that virt-controller have access to the pod and to the subject information.
And there is a mechanism that enable passing info from virt-controller to virt-handler, that is kubevirt.io/network-info annotation.
Can we use this mechanism to pass the primary network interface name as well?
Is so, I think it will save from extending the status API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi I've added your suggestion as an alternative, it will be visible in the next push.
Since the kubevirt.io/network-info
annotation is set on the virt-launcher pod (and not on the VMI object), virt-handler doesn't have access to it.
Since this information is needed in virt-handler as well (for network setup and network status report), a new mechanism should be created in order to share this information.
Another con to this suggestion, is the need to have the downward API for every virt-launcher pod, in comparison to just the ones derived from VMIs with SR-IOV NICs or network binding plugins that require it.
] | ||
``` | ||
|
||
This annotation could be parsed to infer the default (primary) network interface instead of relying on a hard coded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi commented: Does the check relay on default=true?
Is so, I think it should be noted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the primary interface will be the first one which has default == true.
The Kubernetes Network Custom Resource Definition De-facto Standard
states that there could be just one, but recently a bug was discovered that multiple interfaces could be marked as having default == true.
I've added it into the document, it will be visible in the next push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed already, and is available in multus 4.1.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing it @maiqueb :)
This annotation could be parsed to infer the default (primary) network interface instead of relying on a hard coded | ||
interface name (like eth0). | ||
|
||
If the network-status annotation is missing or doesn’t specify a default interface, KubeVirt will fall back to using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi commented: Would kubevirt network-setup require change? (e.g.: be executed again in case annotation is missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, the network setup is agnostic to the interface it configures - meaning it doesn't care if it the interface being configured is bound to the primary pod interface (connected to the cluster network) or to a secondary interface (connected to some other network).
The logic that would probably have to change in the virt-handler is the one calculating pod interfaces' names:
https://github.com/kubevirt/kubevirt/blob/d50cc6c7cc02c5f5c33ffb77fc6710828370376c/pkg/network/setup/netpod/netpod.go#L240
If we will go with the alternative that adds each pod interface's name to the VMI.status.interfaces.[x], we will not have to calculate it, and just take the information directly from the VMI object.
This will enable us to change the naming scheme more easily in the future in case we will need to.
- No configuration is required in KubeVirt | ||
- Provides backward compatibility | ||
|
||
### Cons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdDev commented:
- The added information is probably not useful for a user.
- The added information is not useful for anything beyond this primary network and it opens a big question: What is the primary network and when it is even relevant.
|
||
- https://github.com/kubevirt/kubevirt | ||
|
||
# Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormergi commented: Can we have Kubevirt realize which interface is primary (e.g.: by default route) as an alternative?
Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it should be KubeVirt's responsibility.
Please note that a secondary network could also have a default route.
/uncc @aburdenthehand @cwilkers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to configure a cluster-wide preferred interface to latch on to in the KubeVirt CR ?
I.e. the admin would indicate what if the preferred pod interface name to attach to; when not present, default to eth0, as done today.
It's a bit tied to the use-case, but has the advantages of not requiring new API (... in KubeVirt), and doesn't expose pointless info in the status.
The proposal assumes that KubeVirt is used with a multi-homing CNI plugin, necessitating a clear distinction between interfaces and their intended roles. For instance, if a VM uses a single interface from a CNI where the interface name is different from the standard A few questions arise:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, while the doc is finished.
|
||
## Alternative 1 - Report Pod Interface Name Per Interface | ||
|
||
Currently, the `VirtualMachineInstanceStatus` struct holds a slice of interface status objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
Also do we really need to configure this PR vm or per network ? I doubt a per VM make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
AFAIK, ATM the VirtualMachineInstance CRD does not have a /status
sub-resource [1].
Could you please elaborate on the scenarios on which updating the VMI status could be ignored?
Also do we really need to configure this PR vm or per network ? I doubt a per VM make much sense.
Currently both per-interface and once per VMI options are described.
Both has their pros and cons.
Could you please elaborate why do you think having the information per VM does not make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state.
If it is a subresource or not is an implementation detail that is suppose to improve safety.
AFAIK, ATM the VirtualMachineInstance CRD does not have a
/status
sub-resource [1].
The VMI has a status subresource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state. If it is a subresource or not is an implementation detail that is suppose to improve safety.
If you have a CRD with a status the put a condition about "success" and a Reconcile cycle that update that, you don't want to enter that reconcile if you update it since it will be a loop, so depending on the implementation that is skip, but maybe not the case for kubevirt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state. If it is a subresource or not is an implementation detail that is suppose to improve safety.
AFAIK, ATM the VirtualMachineInstance CRD does not have a
/status
sub-resource [1].The VMI has a status subresource.
The VMI CRD does not have a status sub-resource.
Please see:
in comparison to the VirtualMachine CRD that does have it:
https://github.com/kubevirt/kubevirt/blob/dc6b76f494ffdb95305fed5ce038963539d469a1/pkg/virt-operator/resource/generate/components/crds.go#L205
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently both per-interface and once per VMI options are described. Both has their pros and cons.
Could you please elaborate why do you think having the information per VM does not make sense?
As far as I have see primary interface is common between all the VMs on the same network, so it do not make sense to configure it per VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VMI CRD does not have a status sub-resource.
This is surprising.
As far as I have see primary interface is common between all the VMs on the same network, so it do not make sense to configure it per VM.
I am not familiar with an entity that is describing a VM collection per namespace.
The Kubevirt CR is at the cluster level and creating a new resource kind for this does not make sense.
But I also do not understand how is this even relevant for the line this comment thread refers to.
Are you suggesting another alternative? Can you suggest to add one at a proper location with all the details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
## Using Multus’ network-status annotation | ||
|
||
When KubeVirt is used in a cluster that has Multus as its default CNI, Multus adds the | ||
`k8s.v1.cni.cncf.io/network-status` annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it worth to consider looking on k8s.ovn.org/pod-networks
?
or k8s.v1.cni.cncf.io/network-status
is preferable, because it is the CNI answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since KubeVirt could be used with an arbitrary CNI, I don't think it should look for CNI-specific annotations.
Having said that, Multus is an exception since KubeVirt is currently coupled to it.
} | ||
``` | ||
|
||
VMI controller will set this field's value from Multus' network-status annotation, or fallback to `eth0` in case Multus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth to remove the common part from the alternatives please, as the alternatives differ about what to report,
but the how is common
|
||
## Alternative 1 - Report Pod Interface Name Per Interface | ||
|
||
Currently, the `VirtualMachineInstanceStatus` struct holds a slice of interface status objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the k8s subresource to define something is not a good idea, since depending on how it's implemented it may be ignored by reconcile.
I do not understand this. The status of any resource is always reconciled, that is the idea of having a status that represents the current state.
If it is a subresource or not is an implementation detail that is suppose to improve safety.
AFAIK, ATM the VirtualMachineInstance CRD does not have a
/status
sub-resource [1].
The VMI has a status subresource.
However, users might require custom network setups where the pod’s primary interface is not named `eth0`. | ||
Currently, KubeVirt attaches the VM’s interface destined to the pod network to the pod’s `eth0` interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two sentences probably should be swapped. First describe how it is now and then what is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Done.
When creating a VM with secondary networks, KubeVirt specifies the pod interfaces’ names for all secondary interfaces. | ||
It is currently not possible to specify the primary interface’s name when templating the virt-launcher pod, since there | ||
is no K8s API that enables it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a solution to me. Defining in advance the name of the interface is what we do for secondary and it could have been an option to solve it in the same way, but we cannot. Perhaps you can add this as a note or as part of an alternative that is not choosen due to lack of an ability to implement this.
(I must admit, it sounds strange that this cannot be done using an env variable on the node, but this is not really something we need to explore further at this stage)
|
||
- Allow the primary pod interface name to be dynamically determined based on the environment. | ||
- Ensure backward compatibility, so existing VMs with `eth0` as the primary interface do not break. | ||
- Handle the scenario of a non-eth0 primary interface with a binding plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this goal. If we can consume a non standard name, why is this needed?
- Change Kubernetes' API. | ||
- Change Multus' API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not goals, these are implementations.
A non-goal will be for example: Support dynamic consumption of a primary pod interface name on deployments without Multus.
But it may be better not to mention anything.
... | ||
interfaces: | ||
- infoSource: 'domain, guest-agent, multus-status' | ||
podInterfaceName: eth0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you use in the examples a different name to emphasize the need. Same for the other places.
Perhaps sync the name with the network-status
dump used before.
|
||
### Pros | ||
|
||
- Less API than the first alternative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would either drop this or move it to the bottom. I see no special pro here... on the contrary.
## Alternative 3 - Use annotations instead of K8s API | ||
|
||
<!-- TODO: Complete this section --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not add it unless you have content.
|
||
## Network Binding Plugins | ||
|
||
A Virual Machine's NIC could be bound to the pod's primary NIC using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this section.
It already does, so repeating this here adds no special value.
If you want to express that it should consume a non standard name, then please elaborate and explain if you need or do not need to change something.
A Virual Machine's NIC could be bound to the pod's primary NIC using | ||
a [network binding plugin](https://kubevirt.io/user-guide/network/network_binding_plugins/). | ||
|
||
## CNI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this belongs under the Network Binding Plugins
section, not in the same level.
But in continuation to the prev comment, I do not understand why there is a need to give such a long intro here about the CNI of a binding plugin is. You can just leave the reference to the user-guide or the design-proposal.
In general, we are not providing as input to the binding plugin any information today about it.
I see no option to do it, so that should be the statement here.
Adding a recommendation on what a binding plugin author can do to mitigate this is possible, but we do not expect anything more than that.
801ff32
to
13b601f
Compare
|
Signed-off-by: Orel Misan <omisan@redhat.com>
13b601f
to
bdd345f
Compare
What this PR does / why we need it:
This proposal proposes a mechanism to dynamically determine the primary pod interface for KuveVirt virtual machines (
VMs).
The goal is to move away from the hard coded
eth0
interface and allow flexible, environment-driven interfaceselection.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: