Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/e2e: Libvirt Env tests are unstable #1831

Open
stevenhorsman opened this issue May 3, 2024 · 6 comments
Open

tests/e2e: Libvirt Env tests are unstable #1831

stevenhorsman opened this issue May 3, 2024 · 6 comments

Comments

@stevenhorsman
Copy link
Member

stevenhorsman commented May 3, 2024

We see occasional (anecdotally <20% of the time) failures on the libvirt nightly CI, which seems to always (so far) pass on re-run and now we've seen in on a PR test, so it's becoming more of an obstacle, so we should investigate it when we get the chance

=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test
    assessment_runner.go:262: timed out waiting for the condition
--- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly (600.10s)
    --- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test (600.10s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test
    assessment_runner.go:262: timed out waiting for the condition
--- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment (600.04s)
    --- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test (600.04s)
RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test
    assessment_runner.go:262: timed out waiting for the condition
--- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly (600.06s)
    --- FAIL: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test (600.06s)
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
    assessment_runner.go:262: timed out waiting for the condition
--- FAIL: TestLibvirtCreatePeerPodAndCheckWorkDirLogs (600.16s)
    --- FAIL: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test (600.16s)
@stevenhorsman
Copy link
Member Author

This is getting worse and we are hitting it multiple times on each PR now. I've tried running this test locally and in about 8 re-runs it worked every time, so I'm not sure of the cause of the failure. In the short term I think we need to skip it in the CI to stop it blocking PRs.

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue May 9, 2024
The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
test is failing semi-regularly on the CI, but seems to run okay
locally, so skip it until we have a chance to debug.
See confidential-containers#1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue May 9, 2024
The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
and TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
tests are failing semi-regularly on the CI, but seems to run okay
locally, so skip it until we have a chance to debug.
See confidential-containers#1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
wainersm pushed a commit that referenced this issue May 14, 2024
The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
and TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
tests are failing semi-regularly on the CI, but seems to run okay
locally, so skip it until we have a chance to debug.
See #1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
beraldoleal pushed a commit to beraldoleal/cloud-api-adaptor that referenced this issue May 27, 2024
The TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
and TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
tests are failing semi-regularly on the CI, but seems to run okay
locally, so skip it until we have a chance to debug.
See confidential-containers#1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@stevenhorsman
Copy link
Member Author

It is possible that this is related to the image-pull changes as Chengyu is touch the config merge code in kata-containers/kata-containers#9695, so after this, we should try re-testing this.

@stevenhorsman stevenhorsman changed the title tests/e2e: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly test unstable tests/e2e: Libvirt Env tests are unstable Jun 12, 2024
@stevenhorsman
Copy link
Member Author

Hmm, this is suspicious, now the e2e tests related to env are skipped I've seen:

=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
    assessment_runner.go:262: timed out waiting for the condition
--- FAIL: TestLibvirtCreatePeerPodAndCheckWorkDirLogs (600.16s)
    --- FAIL: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test (600.16s)

start failing, so maybe it's related to something before now being cleaned up, or the workdir has the same issue?

@stevenhorsman
Copy link
Member Author

start failing, so maybe it's related to something before now being cleaned up, or the workdir has the same issue?

This has failed the last three nightlies, so I will raise a PR to skip this for now

stevenhorsman added a commit to stevenhorsman/cloud-api-adaptor that referenced this issue Jul 10, 2024
The TestLibvirtCreatePeerPodAndCheckWorkDirLogs test
has failed on a few PRs and the last three nightly test runs,
so skip it until we have a chance to debug.
See confidential-containers#1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@wainersm
Copy link
Member

@stevenhorsman yesterday I ran TestLibvirtCreatePeerPodAndCheckWorkDirLogs a couple of times locally with the hope of reproducing the error but it always passed!

Then I started working on a golang equivalent of kubectl describe so we could print more info on CI, but ran out of time...

@stevenhorsman
Copy link
Member Author

@stevenhorsman yesterday I ran TestLibvirtCreatePeerPodAndCheckWorkDirLogs a couple of times locally with the hope of reproducing the error but it always passed!

Yeah - I have this experience with the other tests too. My hope is that a new version of the kata-agent and image-rs might have addressed some of these, so I will re-test after they've been bumped

stevenhorsman added a commit that referenced this issue Jul 11, 2024
The TestLibvirtCreatePeerPodAndCheckWorkDirLogs test
has failed on a few PRs and the last three nightly test runs,
so skip it until we have a chance to debug.
See #1831

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants