Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/e2e: test changes to the pre-install-payload image #179

Merged
merged 6 commits into from
Aug 29, 2023

Conversation

wainersm
Copy link
Member

Currently changes on install/pre-install-payload directory aren't tested because the scripts aren't re-building the pre-install-payload image, and with this change the image will always be built and used.

Fixes #177

@wainersm
Copy link
Member Author

I tested it locally on ubuntu and centos8 VMs. Let me see if it works fine on SEV and TDX:

/test

@wainersm
Copy link
Member Author

/test-kata-qemu-sev

@wainersm
Copy link
Member Author

Hi @arronwy ! Could you please help me understand why it failed in TDX jobs?

I added a handling for CentOS in ea873b9#diff-4f68cb05ed27370659c120227cdc91b0db7a0d38cdcbde60582cfc94ae9b8f71R35 that I suspect caused the failures. In a VM locally using CentOS Stream8 it worked out though.

@arronwy
Copy link
Member

arronwy commented Mar 14, 2023

Hi @arronwy ! Could you please help me understand why it failed in TDX jobs?

I added a handling for CentOS in ea873b9#diff-4f68cb05ed27370659c120227cdc91b0db7a0d38cdcbde60582cfc94ae9b8f71R35 that I suspect caused the failures. In a VM locally using CentOS Stream8 it worked out though.

Hi @wainersm the tests seems failed at the operator uninstall case:

09:31:08 ok 14 [cc][agent][kubernetes][containerd] Test cannot pull an encrypted image inside the guest without decryption key
09:51:08 Build timed out (after 20 minutes). Marking the build as aborted.
09:51:13 Build was aborted

expected log:

23:49:55 ok 14 [cc][agent][kubernetes][containerd] Test cannot pull an encrypted image inside the guest without decryption key
23:50:52 ok 15 [cc][operator] Test can uninstall the operator
23:51:37 ok 16 [cc][operator] Test can reinstall the operator

This PR: #180 also have similar issue, the failed cases seems not related with this PR.

@wainersm
Copy link
Member Author

Hi @arronwy ! Could you please help me understand why it failed in TDX jobs?
I added a handling for CentOS in ea873b9#diff-4f68cb05ed27370659c120227cdc91b0db7a0d38cdcbde60582cfc94ae9b8f71R35 that I suspect caused the failures. In a VM locally using CentOS Stream8 it worked out though.

Hi @wainersm the tests seems failed at the operator uninstall case:

09:31:08 ok 14 [cc][agent][kubernetes][containerd] Test cannot pull an encrypted image inside the guest without decryption key
09:51:08 Build timed out (after 20 minutes). Marking the build as aborted.
09:51:13 Build was aborted

hmm... so it got stuck on the uninstall test until it hit the job timeout ("Build timed out (after 20 minutes)"). I will open an RFE issue: it would be nice to have a timeout (<20min) on the test itself so it does have the opportunity to clearly fail and print debugging messages. The way it is today we only see the job abort message and nothing more.

Thanks for the help @arronwy ! Ah, mind to review the code? :D

@wainersm
Copy link
Member Author

The SEV job failed because it hit the docker.io pull limit:

13:27:03 TASK [Start a docker registry] *************************************************
13:27:10 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error pulling docker.io/library/registry - code: None message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: [https://www.docker.com/increase-rate-limit"}](https://www.docker.com/increase-rate-limit%22%7D)
13:27:10

@wainersm
Copy link
Member Author

Rebased to main. Let's see if the tests pass now:

/test

@wainersm
Copy link
Member Author

@mythi hi Mikko, is there a known issue with enaclave-cc CI?

In any case, those changes doesn't touch enclave-cc...can I go ahead and merge once I get the approvals?

Copy link
Member

@jepio jepio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; i remember having to do similar things when testing the operator (build and pushing pre-install-payload to local registry, the --insecure flag, etc.)

@wainersm
Copy link
Member Author

LGTM; i remember having to do similar things when testing the operator (build and pushing pre-install-payload to local registry, the --insecure flag, etc.)

Thanks Jeremi!

Copy link
Member

@BbolroC BbolroC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! @wainersm

@mythi
Copy link
Contributor

mythi commented Mar 29, 2023

@mythi hi Mikko, is there a known issue with enaclave-cc CI?

In any case, those changes doesn't touch enclave-cc...can I go ahead and merge once I get the approvals?

I restarted the failing job to see if it was just a temporary error. FWIW, the failure was not in enclave-cc but when pushing the operator image to a "local" registry needed by the tests.

@fidencio
Copy link
Member

TDX failures:

15:33:51 + docker manifest push --insecure localhost:5000/container-engine-for-cc-payload:4fb11849089f1b6ff69cfbe3b995a7368b5d82fd
15:33:51 sha256:411a9bfb0a321a212cddc38e8f0e1b4bd25a2343bfb29ca83093b1a2efdd239f
15:33:51 + docker manifest push --insecure localhost:5000/container-engine-for-cc-payload:latest
15:33:51 failed to put manifest localhost:5000/container-engine-for-cc-payload:latest: errors:
15:33:51 manifest blob unknown: blob unknown to registry
15:33:51 manifest blob unknown: blob unknown to registry
15:33:51 
15:33:51 make: *** [Makefile:8: containerd-container-image] Error 1

@fidencio
Copy link
Member

/test-tdx

@jepio
Copy link
Member

jepio commented Mar 29, 2023

Might not be relevant but I'll post it just in case: when I was debugging the uninstall hook a couple weeks back I had trouble updating multi arch manifests in the local registry. Modifying and rebuilding the container image, but keeping the same tag name for multi arch manifest somehow did not result in updated images being used when pulled.

So let's make sure we're cleaning up the local registry between runs.

@mythi
Copy link
Contributor

mythi commented Mar 29, 2023

@mythi hi Mikko, is there a known issue with enaclave-cc CI?

In any case, those changes doesn't touch enclave-cc...can I go ahead and merge once I get the approvals?

I restarted the failing job to see if it was just a temporary error. FWIW, the failure was not in enclave-cc but when pushing the operator image to a "local" registry needed by the tests.

enclave-cc is OK

@wainersm
Copy link
Member Author

TDX failures:

15:33:51 + docker manifest push --insecure localhost:5000/container-engine-for-cc-payload:4fb11849089f1b6ff69cfbe3b995a7368b5d82fd
15:33:51 sha256:411a9bfb0a321a212cddc38e8f0e1b4bd25a2343bfb29ca83093b1a2efdd239f
15:33:51 + docker manifest push --insecure localhost:5000/container-engine-for-cc-payload:latest
15:33:51 failed to put manifest localhost:5000/container-engine-for-cc-payload:latest: errors:
15:33:51 manifest blob unknown: blob unknown to registry
15:33:51 manifest blob unknown: blob unknown to registry
15:33:51 
15:33:51 make: *** [Makefile:8: containerd-container-image] Error 1

This error is related :(

Failing to push the image's manifest to the local registry. I'm looking at how to debug it without access to the test env....

@jepio
Copy link
Member

jepio commented May 12, 2023

@wainersm can we revive this? i would like to make changes to the operator+pre-install-payload and it would be great to have a way to test it.

@wainersm
Copy link
Member Author

I only rebased this PR and found not conflict. I checked the code and it seems updated, let's see if it still works:

/test

@wainersm
Copy link
Member Author

Failed to add Google's kubernetes repository on build containerd image:

10:37:14 #9 9.663 Err:2 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
10:37:14 #9 9.663   The following signatures couldn't be verified because the public key is not available: NO_PUBKEY B53DC80D13EDEF05
10:37:14 #9 9.681 Hit:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease
10:37:15 #9 9.824 Reading package lists...
10:37:15 #9 10.81 W: GPG error: https://packages.cloud.google.com/apt kubernetes-xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY B53DC80D13EDEF05
10:37:15 #9 10.81 E: The repository 'https://apt.kubernetes.io/ kubernetes-xenial InRelease' is not signed.

Usually that kind of problem is intermittent....run again, it works. Let's try:

/test-kata-qemu

@jepio
Copy link
Member

jepio commented Jul 14, 2023

@wainersm:
we can switch the gpg key url to https://dl.k8s.io/apt/doc/apt-key.gpg because the other one is unreliable, we've been doing this in all repos.

@wainersm
Copy link
Member Author

@wainersm: we can switch the gpg key url to https://dl.k8s.io/apt/doc/apt-key.gpg because the other one is unreliable, we've been doing this in all repos.

Good to know @jepio , let me append a commit to make that change on this PR still. Thanks!

@wainersm
Copy link
Member Author

Switched the k8s's repository gpg key as suggested by @jepio . Let's try again:

/test-kata-qemu

@wainersm
Copy link
Member Author

/test-kata-qemu

@fidencio
Copy link
Member

@fidencio cleaned up the k8s. Now it is initialized again successfully.

Thanks, @UnmeshDeodhar!

Well, no lucky on getting that working on a re-run though. :-/

@fidencio
Copy link
Member

/test

@UnmeshDeodhar
Copy link
Contributor

Looks like we are not seeing Port 6443 in use error anymore.
BTW, we only started seeing that error on our system recently. Was something changed in the script that is causing this? @fidencio

@fidencio
Copy link
Member

/test

@fidencio
Copy link
Member

Nops, just kubernetes not being properly uninstalled. I don't remember changing anything related to that.
@wainersm, do you know something?

@fidencio
Copy link
Member

@UnmeshDeodhar, SNP: http://jenkins.katacontainers.io/job/confidential-containers-operator-main-ubuntu-20.04_snp-x86_64-containerd_kata-qemu-snp-PR/80/console

09:21:13 INFO: Bring up the test cluster
09:21:13 [init] Using Kubernetes version: v1.24.0
09:21:13 [preflight] Running pre-flight checks
09:21:13 error execution phase preflight: [preflight] Some fatal errors occurred:
09:21:13 	[ERROR Port-6443]: Port 6443 is in use
09:21:13 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
09:21:13 To see the stack trace of this error execute with --v=5 or higher

@fidencio
Copy link
Member

SEV: http://jenkins.katacontainers.io/job/confidential-containers-operator-main-ubuntu-20.04_sev-x86_64-containerd_kata-qemu-sev-PR/336/console

09:26:09 INFO: Running kata-qemu-sev tests for kata-qemu-sev
09:26:09 1..5
09:26:42 not ok 1 [cc][kubernetes][containerd][sev] Test SEV unencrypted container launch success
09:26:42 # (from function `kubernetes_wait_for_pod_ready_state' in file /tmp/tmp.bDAo6vegUW/src/github.com/kata-containers/tests/integration/kubernetes/lib.sh, line 42,
09:26:42 #  in test file sev.bats, line 128)
09:26:42 #   `kubernetes_wait_for_pod_ready_state "$pod_name" 20' failed
09:26:42 # Deleting previous test services...
09:26:42 # ls: cannot access '/tmp/test-kata-sev.Doj3K9s8/*.yaml': No such file or directory
09:26:42 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:26:42 # service/sev-unencrypted created
09:26:42 # deployment.apps/sev-unencrypted created
09:26:42 # error: timed out waiting for the condition on pods/sev-unencrypted-67cd69484c-gzd9c
09:27:24 not ok 2 [cc][kubernetes][containerd][sev] Test SEV encrypted container launch failure with INVALID measurement
09:27:24 # (in test file sev.bats, line 177)
09:27:24 #   `return 1' failed
09:27:24 # Deleting previous test services...
09:27:24 # ls: cannot access '/tmp/test-kata-sev.Doj3K9s8/*.yaml': No such file or directory
09:27:24 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:27:24 # Firmware Measurement: vTBgtVLcAwiF5mY92fl0OslnOt8DISH5RTZ6C8jlpj8=
09:27:24 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:27:24 # service/sev-encrypted created
09:27:24 # deployment.apps/sev-encrypted created
09:27:24 # error: timed out waiting for the condition on pods/sev-encrypted-7c46f544b7-pvt99
09:27:24 # -------------------------------------------------------------------------------
09:27:24 # NAME                         STATUS   ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION          CONTAINER-RUNTIME
09:27:24 # amd-coco-ci-ubuntu2004-001   Ready    control-plane   5m47s   v1.24.0   10.216.91.122   <none>        Ubuntu 20.04.6 LTS   5.19.2-051902-generic   containerd://1.6.8.2
09:27:24 # -------------------------------------------------------------------------------
09:27:25 # NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE     SELECTOR
09:27:25 # kubernetes        ClusterIP   10.96.0.1       <none>        443/TCP   5m45s   <none>
09:27:25 # sev-encrypted     ClusterIP   10.98.8.46      <none>        22/TCP    21s     app=sev-encrypted
09:27:25 # sev-unencrypted   ClusterIP   10.104.60.145   <none>        22/TCP    42s     app=sev-unencrypted
09:27:25 # -------------------------------------------------------------------------------
09:27:25 # NAME              READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS        IMAGES                                                                SELECTOR
09:27:25 # sev-encrypted     0/1     1            0           21s   sev-encrypted     ghcr.io/confidential-containers/test-container:multi-arch-encrypted   app=sev-encrypted
09:27:25 # sev-unencrypted   0/1     1            0           42s   sev-unencrypted   ghcr.io/confidential-containers/test-container:unencrypted            app=sev-unencrypted
09:27:25 # -------------------------------------------------------------------------------
09:27:25 # NAME                               READY   STATUS              RESTARTS   AGE   IP       NODE                         NOMINATED NODE   READINESS GATES
09:27:25 # sev-encrypted-7c46f544b7-pvt99     0/1     ContainerCreating   0          21s   <none>   amd-coco-ci-ubuntu2004-001   <none>           <none>
09:27:25 # sev-unencrypted-67cd69484c-gzd9c   0/1     ContainerCreating   0          42s   <none>   amd-coco-ci-ubuntu2004-001   <none>           <none>
09:27:25 # -------------------------------------------------------------------------------
09:27:25 # Name:           sev-encrypted-7c46f544b7-pvt99
09:27:25 # Namespace:      default
09:27:25 # Priority:       0
09:27:25 # Node:           amd-coco-ci-ubuntu2004-001/10.216.91.122
09:27:25 # Start Time:     Fri, 25 Aug 2023 07:26:43 +0000
09:27:25 # Labels:         app=sev-encrypted
09:27:25 #                 pod-template-hash=7c46f544b7
09:27:25 # Annotations:    io.katacontainers.config.pre_attestation.uri: 10.216.91.122:44444
09:27:25 #                 io.katacontainers.config.sev.policy: 3
09:27:25 # Status:         Pending
09:27:25 # IP:
09:27:25 # IPs:            <none>
09:27:25 # Controlled By:  ReplicaSet/sev-encrypted-7c46f544b7
09:27:25 # Containers:
09:27:25 #   sev-encrypted:
09:27:25 #     Container ID:
09:27:25 #     Image:          ghcr.io/confidential-containers/test-container:multi-arch-encrypted
09:27:25 #     Image ID:
09:27:25 #     Port:           <none>
09:27:25 #     Host Port:      <none>
09:27:25 #     State:          Waiting
09:27:25 #       Reason:       ContainerCreating
09:27:25 #     Ready:          False
09:27:25 #     Restart Count:  0
09:27:25 #     Environment:    <none>
09:27:25 #     Mounts:
09:27:25 #       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qckkp (ro)
09:27:25 # Conditions:
09:27:25 #   Type              Status
09:27:25 #   Initialized       True
09:27:25 #   Ready             False
09:27:25 #   ContainersReady   False
09:27:25 #   PodScheduled      True
09:27:25 # Volumes:
09:27:25 #   kube-api-access-qckkp:
09:27:25 #     Type:                    Projected (a volume that contains injected data from multiple sources)
09:27:25 #     TokenExpirationSeconds:  3607
09:27:25 #     ConfigMapName:           kube-root-ca.crt
09:27:25 #     ConfigMapOptional:       <nil>
09:27:25 #     DownwardAPI:             true
09:27:25 # QoS Class:                   BestEffort
09:27:25 # Node-Selectors:              katacontainers.io/kata-runtime=true
09:27:25 # Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
09:27:25 #                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
09:27:25 # Events:
09:27:25 #   Type     Reason                  Age               From               Message
09:27:25 #   ----     ------                  ----              ----               -------
09:27:25 #   Normal   Scheduled               21s               default-scheduler  Successfully assigned default/sev-encrypted-7c46f544b7-pvt99 to amd-coco-ci-ubuntu2004-001
09:27:25 #   Warning  FailedCreatePodSandBox  9s (x2 over 20s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: Error receiving launch bundle from attestation proxy: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.216.91.122:44444: connect: connection refused": unknown
09:27:25 # -------------------------------------------------------------------------------
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Incorrect container folder path:
09:27:25 # Could not retrieve guest kernel append parameters
09:27:25 # Kernel Append Retrieved from QEMU Process:
09:27:25 # TEST - FAIL
09:27:45 not ok 3 [cc][kubernetes][containerd][sev] Test SEV encrypted container launch success with NO measurement
09:27:45 # (from function `kubernetes_wait_for_pod_ready_state' in file /tmp/tmp.bDAo6vegUW/src/github.com/kata-containers/tests/integration/kubernetes/lib.sh, line 42,
09:27:45 #  in test file sev.bats, line 193)
09:27:45 #   `kubernetes_wait_for_pod_ready_state "$pod_name" 20' failed
09:27:45 # Deleting previous test services...
09:27:45 # ls: cannot access '/tmp/test-kata-sev.Doj3K9s8/*.yaml': No such file or directory
09:27:45 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:27:45 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:27:45 # service/sev-encrypted unchanged
09:27:45 # deployment.apps/sev-encrypted unchanged
09:27:45 # error: timed out waiting for the condition on pods/sev-encrypted-7c46f544b7-pvt99
09:28:06 not ok 4 [cc][kubernetes][containerd][sev] Test SEV encrypted container launch success with VALID measurement
09:28:06 # (from function `kubernetes_wait_for_pod_ready_state' in file /tmp/tmp.bDAo6vegUW/src/github.com/kata-containers/tests/integration/kubernetes/lib.sh, line 42,
09:28:06 #  in test file sev.bats, line 228)
09:28:06 #   `kubernetes_wait_for_pod_ready_state "$pod_name" 20' failed
09:28:06 # Deleting previous test services...
09:28:06 # ls: cannot access '/tmp/test-kata-sev.Doj3K9s8/*.yaml': No such file or directory
09:28:06 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:28:06 # Kernel Append:
09:28:06 # Firmware Measurement: Ey7tBGkb7EzNno3GjyseMeYYX53qZhQfUx9Kc56J4Dk=
09:28:06 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:28:06 # service/sev-encrypted unchanged
09:28:06 # deployment.apps/sev-encrypted unchanged
09:28:06 # error: timed out waiting for the condition on pods/sev-encrypted-7c46f544b7-pvt99
09:28:28 not ok 5 [cc][kubernetes][containerd][sev] Test SEV-ES encrypted container launch success with VALID measurement
09:28:28 # (from function `kubernetes_wait_for_pod_ready_state' in file /tmp/tmp.bDAo6vegUW/src/github.com/kata-containers/tests/integration/kubernetes/lib.sh, line 42,
09:28:28 #  in test file sev.bats, line 263)
09:28:28 #   `kubernetes_wait_for_pod_ready_state "$pod_name" 20' failed
09:28:28 # Deleting previous test services...
09:28:28 # ls: cannot access '/tmp/test-kata-sev.Doj3K9s8/*.yaml': No such file or directory
09:28:28 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:28:28 # Kernel Append:
09:28:28 # Firmware Measurement: yOe4CshGnwiXrwK0yKRHqtpZAr8QXZ2xU+hEqUAhd/w=
09:28:28 # mysql: [Warning] Using a password on the command line interface can be insecure.
09:28:28 # service/sev-es-encrypted created
09:28:28 # deployment.apps/sev-es-encrypted created
09:28:28 # error: timed out waiting for the condition on pods/sev-es-encrypted-74488c7b8c-7n7xb
09:28:29 INFO: Uninstall the operator

As the tests had taken an orthogonal approach from the tests running on other platforms, I have no idea where to even start debugging it. @UnmeshDeodhar, @ryansavino, mind to take a look at the failures and double check that everything on the cluster side is okay?

This PR is only adding tests to the pre-install image, which makes me quite certain it is not the reason of the failures on the SEV machine.

@fidencio
Copy link
Member

enclave-cc test is failing due to:

ubuntu@enclave-cc:~/operator$ kubectl describe pod enclave-cc-pod-sim
Name:         enclave-cc-pod-sim
Namespace:    default
Priority:     0
Node:         enclave-cc/192.168.122.33
Start Time:   Fri, 25 Aug 2023 08:56:53 +0000
Labels:       <none>
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  hello-world:
    Container ID:  
    Image:         docker.io/huaijin20191223/scratch-base:v1.8
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /run/rune/boot_instance/build/bin/occlum-run
      /bin/hello_world
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      OCCLUM_RELEASE_ENCLAVE:  0
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qp6r6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-qp6r6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age               From               Message
  ----     ------                  ----              ----               -------
  Normal   Scheduled               19s               default-scheduler  Successfully assigned default/enclave-cc-pod-sim to enclave-cc
  Warning  FailedCreatePodSandBox  8s (x2 over 18s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/run/aesmd" to rootfs at "/var/run/aesmd": stat /var/run/aesmd: no such file or directory: unknown

fidencio and others added 6 commits August 25, 2023 11:14
That's a HostPath mounted, and cannot be removed from within the
container.

This may cause issues like:
```
Removing the /opt/confidential-containers directory
rmdir: failed to remove '/opt/confidential-containers': Device or resource busy
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When building the pre-install-payload image for CI it needs to pull/push
the image from a local registry that is not protected. The `docker
manifest` commands (e.g. create) refuses to connect in an unsecure
registry by default, therefore the pre-install-payload build fail. That
can be solved by passing the --insecure flag to `docker manifest` thus
this change allow to pass extra flags to that command.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently changes on install/pre-install-payload directory aren't tested
because the scripts aren't re-building the pre-install-payload image.
With this change the image will always be built and used.

It was added more two dependencies:
- kustomize: used to edit the kustomization file so to update the pre-install-payload
   image
- qemu-user-static: used by docker buildx to build the pre-install-payload image for
  multiple architectures. It also needs to pass the `--insecure` to
`docker manifest` commands because the image is pushed/pulled to a local
insecure registry, otherwise `docker manifest` fails

Fixes confidential-containers#177
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
We've seen issues like the one shown below as part of the baremetal
machines:
```
09:44:31 failed to put manifest
   localhost:5000/container-engine-for-cc-payload:latest: errors:
09:44:31 manifest blob unknown: blob unknown to registry
09:44:31 manifest blob unknown: blob unknown to registry
09:44:31 manifest blob unknown: blob unknown to registry
09:44:31 manifest blob unknown: blob unknown to registry
```

Those can be avoided by removing previously created
${HOME}/.docker/manifests/$manifest

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of doing `[ ... ] && ...`, let's just expand the if as we could
simply fail the first condition, making the whole script fail, leading
then to a pod error.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure that we also test the pre-install / post-uninstall
images as part of the enclave-cc tests, so we make sure that any changes
we do with Kata Containers in mind won't break enclave-cc.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
@fidencio
Copy link
Member

enclave-cc test is failing due to: ...

This was basically a bad copy and paste from my side, which led to the HW version of the enclave-cc to be used instead of the SIM one.

It should be fixed now, thanks @mythi for the help!

@fidencio
Copy link
Member

/test

Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from enclave-cc side

@wainersm
Copy link
Member Author

@fidencio thanks for taking over this and making it done!

@wainersm
Copy link
Member Author

/test-sev

@wainersm
Copy link
Member Author

/test-snp

@wainersm
Copy link
Member Author

/test-kata-qemu-sev

@wainersm
Copy link
Member Author

/test-kata-qemu-snp

Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @fidencio

@fidencio
Copy link
Member

SEV test is not passing, but we got a green light to proceed from @ryansavino on the following thread: https://cloud-native.slack.com/archives/C039JSH0807/p1693291312001459?thread_ts=1692956110.436209&cid=C039JSH0807

@fidencio fidencio merged commit d144296 into confidential-containers:main Aug 29, 2023
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI should test changes to the pre-install-payload image
9 participants