Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge peer-pods and CoCo operator/payloads together #561

Closed
stevenhorsman opened this issue Jan 24, 2023 · 13 comments
Closed

Merge peer-pods and CoCo operator/payloads together #561

stevenhorsman opened this issue Jan 24, 2023 · 13 comments
Assignees

Comments

@stevenhorsman
Copy link
Member

At the moment there is a fair bit of duplication between the peer pod and coco operator, as we look to make peer pods part of the coco release ideally we'd reduce this and maybe even get to the point that we can get rid of the separate peer pod operator completely?

This is my completely non-expert take, so please update this description as necessary to correct things

Runtime-payload

When the peer pods controller was first created we had to have a custom kata-shim as it was built on the remote hypervisor support just in Yohei's branch, but it's since been merged into CCv0, so do we have an opportunity to merge the runtime-payloads closer together now (and perhaps entirely?)

Currently

ARG IMAGE
FROM ${IMAGE:-quay.io/confidential-containers/runtime-payload-ci:kata-containers-latest}
ARG ARCH
ARG BINARIES=./bin
ARG SCRIPTS=./scripts
ARG CONFIG=./config
#Copy containerd-shim to /opt/confidential-containers/bin
ADD ${BINARIES}/containerd-shim-kata-v2-${ARCH} /opt/kata-artifacts/opt/confidential-containers/bin/containerd-shim-kata-v2
#Copy configuration file
ADD ${CONFIG}/configuration-remote.toml /opt/kata-artifacts/opt/confidential-containers/share/defaults/kata-containers/configuration.toml
#Copy kata-deploy.sh
ADD ${SCRIPTS}/kata-deploy.sh /opt/kata-artifacts/scripts/

the extras in the peer pods runtime-payload are:

  • copying over the containerd-shim-kata-v2 - I think this is the same binary as in CoCo now so can be removed?
  • Copying the remote kata config file - There is a question here of whether we introduce a new kata-remote runtime class and then this config to the CoCo operator for 0.4 as we are planning to be part of the release
  • An updated kata-deploy script - In my non-expert view it looks like there is a little bit new relating to crio and k3s, but otherwise the differences are just due to refactors in CCv0?

Pre-install payload

The peer pod pre-install has some systemd artifacts. It's not clear from #545 (comment) whether these are even needed anymore?

Peer-pod-controller

Could we add extra controllers e.g. peer-pod controller to the CoCo operator?

@stevenhorsman
Copy link
Member Author

\cc @fidencio @bpradipt @jensfr @snir911 @jtumber-ibm - sorry for my ignorance, but I thought it would be good to get this conversation started in public

@bpradipt
Copy link
Member

@stevenhorsman I think we really don't need the runtime-payload code anymore once we have the remote hypervisor config added to the Kata ccv0 release payload.

@bpradipt
Copy link
Member

Minor correction. We will still need the runtime-payload for cri-o which we'll base it on the ccv0 release payload.

@fidencio
Copy link
Member

@stevenhorsman, I'd go for:

  • Add the new peer-pods configuration as part of the default configs for CCv0
  • I'd like to take a at the kata-deploy script, as I'm pretty sure it can be merged to the current one without any problems (and hey, I can help here).

@stevenhorsman
Copy link
Member Author

FYI I've created
kata-containers/kata-containers#6349 - which I have a draft PR for and kata-containers/kata-containers#6351 to cover this work in kata-containers.

@stevenhorsman stevenhorsman self-assigned this Mar 1, 2023
@stevenhorsman
Copy link
Member Author

stevenhorsman commented Mar 21, 2023

FYI - kata-containers/kata-containers#6349 - which I have a draft PR for and kata-containers/kata-containers#6351 have closed, so I think we can start removing our own copy of the runtime-payload once we've tested the kata-cc runtime payload works as expected. One blocker for this is until confidential-containers/operator#180 is merged, I think until that merged the latest runtimePayload will be incompatibly with the latest operator, so we'll probably have to wait for that first.

@stevenhorsman
Copy link
Member Author

The operator blocker is now resolved, so I've created confidential-containers/operator#186 to cover updating the coco operator to add the runtime class. I have a question on how much of the custom peer pod cc-runtime we can remove and merge into the coco version in that though

@stevenhorsman
Copy link
Member Author

In the absence of any feedback in I confidential-containers/operator#186 I went for the (not as nice, but potentially less destructive) approach of just adding the kata-remote runtimeclass in the oeprator code and then trying to define the runtimeclass in the CAA version. My attempt of that is https://github.com/confidential-containers/cloud-api-adaptor/compare/staging...stevenhorsman:cloud-api-adaptor:add-remote-runtimeclass?expand=1
but it doesn't work probably for me. The runtimeclasses aren't created, but it has the error:

 kubectl logs cc-operator-daemon-install-lps9c -n confidential-containers-system
copying kata artifacts onto host
warning: /usr/local/bin/containerd-shim-kata-remote-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-v2 already exists
Creating the default shim-v2 binary
warning: /usr/local/bin/containerd-shim-kata-qemu-tdx-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-sev-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-qemu-se-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-clh-v2 already exists
warning: /usr/local/bin/containerd-shim-kata-clh-tdx-v2 already exists
Add Kata Containers as a supported runtime for containerd
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-remote, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-remote.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-tdx.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-sev.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-se, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-qemu-se.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh.options, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx, overwriting
Configuration exists for plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh-tdx.options, overwriting
Failed to connect to bus: No data available

which was seen in confidential-containers/operator#183, so supposed to have been resolved, so I'm not sure if I've missed something there?

@stevenhorsman
Copy link
Member Author

I got Fabiano's help on it, who reminded me that the hostPaths have changed with the new base image (so another example where having the peer pods version of cc-runtime isn't ideal as we need to ensure we port across fixes). Re-testing now...

@stevenhorsman
Copy link
Member Author

stevenhorsman commented Mar 31, 2023

I've carried on working on this and tried to use https://github.com/stevenhorsman/cloud-api-adaptor/blob/8c5414e30c5cf59f4203cea7e68d6b80a16d4c94/install/yamls/kustomization.yaml to overwrite the values in the cc runtime, so that cc runtime is the same as the coco version and then we can get rid of it, but I can't get the kustomize to work and it's not pulling the image I specify in there, but instead the base image:

 Warning  Failed     61s (x4 over 2m24s)  kubelet            Failed to pull image "quay.io/confidential-containers/runtime-payload:kata-containers": rpc error: code = NotFound desc = failed to pull and unpack image "quay.io/confidential-containers/runtime-payload:kata-containers": failed to resolve reference "quay.io/confidential-containers/runtime-payload:kata-containers": quay.io/confidential-containers/runtime-payload:kata-containers: not found

so I think someone who understand the operator better than me needs to pick this up and move it along

@stevenhorsman
Copy link
Member Author

stevenhorsman commented Apr 17, 2023

With the changes that Pradipta made in confidential-containers/operator#202 I think the main piece of this is complete. What remains is:

  • Updating our documentation to reflect the new two stage process:
    • Using CoCo operator to deploy the base stack
    • Deploy the CAA pod and config
    • This involves updating all our docs/tests to use the new kata-remote runtimeclass
  • Remove the old code that isn't used anymore
    • pre-install and runtime payload directories
    • yamls/deploy.yaml and yamls/ccruntime-peer-pods.yaml

These could be split out under a separate issue, or done here.

@ariel-adam
Copy link
Member

@stevenhorsman should this be moved to 0.6.0 for the list of issues not complete yet?

@stevenhorsman
Copy link
Member Author

Thanks for the nudge Ariel, to have track these tasks better I broke them out into individual issues, so I think that can be close now:

I'll add them to the 0.6 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

4 participants