Cloud Hypervisor is a Virtual Machine Monitor meant for running modern Cloud workloads. It is written in Rust and free of legacy devices to provide a smaller attack surface to the guest, which makes it a more secure alternative when it comes to running virtual machines.
This document describes a design proposal for integrating Cloud Hypervisor with KubeVirt, providing KubeVirt's users the possibility to rely on Cloud Hypervisor to create virtual machines as an alternative to the default libvirt/QEMU.
Since Cloud Hypervisor aims at running virtual machines more securely, it is important to offer KubeVirt's user this choice.
Cloud Hypervisor has been designed for Cloud workloads, which makes it perfectly fit for the Cloud Native ecosystem, and that is the reason why it is already integrated as part of the Kata Containers project.
To extend its overall Cloud Native support, it seems logical to integrate it with KubeVirt.
One other reason for going through this effort is to identify if the abstraction layers are correctly defined to support another VMM. This will help improve the existing code by defining cleaner interfaces if needed.
Provide users a way to choose Cloud Hypervisor over libvirt/QEMU to run their virtual machines.
Support all features available through KubeVirt.
Since Cloud Hypervisor has a much narrower scope than libvirt/QEMU, it doesn't support as many features. Therefore, we can only expect a subset of KubeVirt's features to be supported by Cloud Hypervisor.
This feature is directed at KubeVirt's users who want to run virtual machine more securely by choosing Cloud Hypervisor over libvirt/QEMU.
A user recently tried Cloud Hypervisor and wants to use it for running virtual machines on his Kubernetes/KubeVirt cluster.
Looking at KubeVirt's architecture, each virt-launcher
instance manages a
single pod. This is the abstraction layer we need to replace so that instead of
using libvirt
to spawn QEMU's virtual machines, it will run and communicate
with Cloud Hypervisor directly.
A new launcher ch-launcher
will be created so that it fully replaces the
existing virt-launcher
component when needed.
Example of how a user could request Cloud Hypervisor as the underlying hypervisor through the VMI spec:
spec:
hypervisor: cloud-hypervisor
- Introduction of a new field
Hypervisor
inVirtualMachineInstanceSpec
- By default if no
Hypervisor
is provided, it would default tolibvirt
. - The two acceptable entries would be either
cloud-hypervisor
orlibvirt
.
The virt-operator
can inform all other components about the hypervisor type
based on the information from the VMI spec. A different virt-launcher
image
would be picked instead of the default one, so that it contains ch-launcher
.
Here is a list of features expected to be available with Cloud Hypervisor.
- Create a VM
- Start a VM
- Pause/Resume a VM
- Snapshot/Restore a VM
- Stop a VM
- OVMF support (EFI)
- CPU topology
- CPU model is exclusively the equivalent of host for QEMU (no emulation of specific CPU model)
- RNG using virtio-rng
- Only headless VMs as we don't have graphics or video device emulation
- CPU constraints + hotplug
- Memory constraints + hotplug
- Hugepages
- Host NUMA to select specific host CPUs and make sure memory is allocated on expected NUMA node
- Guest NUMA to expose any NUMA configuration to the guest
- Disk support with virtio-block
- Filesystem support with virtio-fs
- Volume hotplug
- Offline snapshot
- Online snapshot (will come later with the guest agent support for freezing the filesystem)
- Support based on virtio-net or vhost-user-net
- Support for tap and macvtap
- VFIO supported for passing through PCI devices
- Serial port (0x3f8) and virtio-console are supported
- Create a PTY so that external process can later connect to it
- Support for SSH as it directly depends on virtio-net support
- Support for SGX
- Experimental support for TDX
- x86_64
- Aarch64
This should be supported eventually but it still requires some assessment of how to achieve it. Therefore we might not see this feature being supported for some time.
Here is the list of what will be missing compared to what libvirt/QEMU supports:
- No CPU model emulation
- No way to pick a type of clock
- No way to pick a type of timer
- No support for emulated video and graphics devices
- No way to pick between different features like
acpi
,apic
. We can selecthyperv
though, which enables KVM Hyper-V enlightments - No support for emulated input device
- No support for cdrom, floppy disk or luns
- No support for resizable disk
- No support for emulated NICs such as e1000, e1000e, ... (which means no SLIRP)
- No support for VNC
Features related to the ability of running a dedicated agent in the guest have
not been tested yet. The existing QEMU agent must be evaluated to see if it
could work and be reused directly with Cloud Hypervisor. If that's not the case,
an agent program would have to be developed for operations like GuestPing
,
ListInterfaces
, ...
This new feature should not impact updates moving forward since it doesn't remove anything.
Create an additional CI entry to run Cloud Hypervisor dedicated testing. And of course the set of tests that will be run would be a subset of what is already available.
As a reference, a PoC can be found through the following pull request.
It modifies the existing virt-launcher
component so that it manages Cloud
Hypervisor VMs instead of libvirt ones.
It adds support for the following features:
- containerDisk Since there's no support for compressed QCOW2 in Cloud Hypervisor, I've simply converted the image to a RAW version. That means we don't get the COW benefit but it works fine.
- emptyDisk I've added a way to create a RAW image instead of QCOW2 since Cloud Hypervisor doesn't support compressed QCOW2 images.
- cloudInitNoCloud Pretty straightforward, I reused most of the code provided by the repository
- Console I had to run two extra go routines to redirect input/output between the PTY device that is created by Cloud Hypervisor and the socket located at /var/run/kubevirt-private//virt-serial0 that is expected by virt-handler
- Network Added support for both bridge and masquerade modes. This is done through the existing code, with minimal changes as I used the api.Domain reference that is being modified to retrieve both TAP interface name and expected MAC address
- Kernel boot + initramfs This is "supposedly" working but when I used vmi-kernel-boot example I ended up running into some issues because the kernel binary vmlinuz is not a PVH ELF header. I didnt' spend some time creating a dedicated docker image containing the right type of kernel binary but I expect this to work as long as the user provides a proper image
- VM lifecycle
- Sync VMI creates and boots the VM based on the configuration that has been generated from the VirtualMachineInstanceSpec. The support for updating the VM and especially hotplugging devices hasn't been implemented through this POC.
- Pausing and resuming the VM is supported through virtctl
- Stopping and deleting the VM is also supported through kubectl delete.
- Lifecycle events Listen to the events reported by Cloud Hypervisor through the event-monitor socket, and transform them into domain events, setting the appropriate status and reason for a state change
It has been tested with the following VMI examples:
- examples/vmi-fedora
- examples/vmi-masquerade
Note the Bazel workspace had to be updated so that the virt-launcher
container image would be generated with both CLOUDHV.fd
firmware and
the Cloud Hypervisor binary.
Create a minimal ch-launcher
binary based off the virt-launcher
one, just
enough to launch Cloud Hypervisor and connect to it, but with the domain manager
implementation providing empty shells.
The first thing to do is to update the Bazel workspace to be able to generate a
new ch-launcher
image dedicated for Cloud Hypervisor. This image should
contain what is needed to start a Cloud Hypervisor virtual machine, that is
the cloud-hypervisor
binary pulled from the Cloud Hypervisor release, and the
associated OVMF firmware called CLOUDHV.fd
. It must also contain the
ch-launcher
binary instead of the virt-launcher
one.
Add a new field Hypervisor
to the VirtualMachineInstanceSpec
structure to
carry information about which hypervisor should be used.
Update all the components where it's assumed to always rely on virt-launcher
image so that it is dynamically chosen based on the Hypervisor
value.
At this point we must extend the minimal ch-launcher
implementation to end up
with a functional implementation so that some testing can be performed.
Define a new entry in the CI to perform the testing of KubeVirt with Cloud
Hypervisor. The amount of tests that can be run will be directly dependent on
the amount of features supported by this first version of ch-launcher
.
At this point, it makes sense to submit one pull request per new feature that we want to support as part of the Cloud Hypervisor integration effort.