Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent.TestKubernetesIntegrationRecipe is failing #4360

Closed
thbkrkr opened this issue Mar 18, 2021 · 9 comments · Fixed by #4362
Closed

agent.TestKubernetesIntegrationRecipe is failing #4360

thbkrkr opened this issue Mar 18, 2021 · 9 comments · Fixed by #4362
Assignees
Labels
>test Related to unit/integration/e2e tests

Comments

@thbkrkr
Copy link
Contributor

thbkrkr commented Mar 18, 2021

test/e2e/agent.TestKubernetesIntegrationRecipe/ES_data_should_pass_validations failed in all nightly jobs:

https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-snapshot-versions/275/testReport/
https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-eks/316/testReport
https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-aks/648/testReport
https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-kind-k8s-versions/383/testReport
https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-stack-versions/373/testReport

cloud-on-k8s-e2e-tests-stack-versions shows that the failure only occurs with stack version 7.11.2, which was recently updated with #4355.
6.8.x failed because of another test: kb.TestVersionUpgradeAndRespecToLatest7x.

@thbkrkr thbkrkr added the >test Related to unit/integration/e2e tests label Mar 18, 2021
@pebrc
Copy link
Collaborator

pebrc commented Mar 18, 2021

Looks like the agent is just sitting idle:

==== START logs for pod e2e-mlmzr-mercury/pod/elastic-agent-jwj6-agent-6p8lg ====
2021-03-18T13:04:39.693Z        INFO    warn/warn.go:18 The Elastic Agent is currently in BETA and should not be used in production
2021-03-18T13:04:39.694Z        INFO    application/application.go:59   Detecting execution mode
2021-03-18T13:04:39.694Z        INFO    application/application.go:68   Agent is managed locally
2021-03-18T13:04:42.000Z        INFO    [composable]    composable/controller.go:44     EXPERIMENTAL - Inputs with variables are currently experimental and should not be used in production
2021-03-18T13:04:42.093Z        INFO    [composable.providers.docker]   docker/docker.go:43     Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2021-03-18T13:04:42.094Z        INFO    [composable.providers.kubernetes]       kubernetes/kubernetes.go:64     Kubernetes provider started with node scope
2021-03-18T13:04:42.094Z        INFO    [composable.providers.kubernetes]       kubernetes/util.go:114  kubernetes: Using pod name elastic-agent-jwj6-agent-6p8lg and namespace e2e-mlmzr-mercury to discover kubernetes node
2021-03-18T13:04:42.111Z        INFO    [composable.providers.kubernetes]       kubernetes/util.go:120  kubernetes: Using node ip-192-168-90-34.eu-west-2.compute.internal discovered by in cluster pod node query
2021-03-18T13:04:42.311Z        INFO    [api]   api/server.go:62        Starting stats endpoint
2021-03-18T13:04:42.311Z        INFO    application/local_mode.go:156   Agent is starting
2021-03-18T13:04:42.311Z        INFO    [api]   api/server.go:64        Metrics endpoint listening on: /tmp/elastic-agent/elastic-agent.sock (configured: unix:///tmp/elastic-agent/elastic-agent.sock)
2021-03-18T13:04:42.312Z        INFO    application/local_mode.go:166   Agent is stopped
2021-03-18T13:04:42.312Z        INFO    application/periodic.go:76      Configuration changes detected
2021-03-18T13:04:42.313Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:04:52.313Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:02.313Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:12.313Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:22.313Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:32.314Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:42.314Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:05:52.314Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:02.314Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:12.314Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:22.315Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:32.315Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:42.315Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:06:52.315Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:07:02.315Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:07:12.316Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:07:22.316Z        INFO    application/periodic.go:98      No configuration change
2021-03-18T13:07:32.316Z        INFO    application/periodic.go:98      No configuration change
==== END logs for pod e2e-mlmzr-mercury/pod/elastic-agent-jwj6-agent-6p8lg ====

I am not sure what 2021-03-18T13:04:42.312Z INFO application/local_mode.go:166 Agent is stopped means

OK maybe it is relevant:

// Local represents a standalone agents, that will read his configuration directly from disk.
// Some part of the configuration can be reloaded.

@pebrc
Copy link
Collaborator

pebrc commented Mar 18, 2021

I think I know what it is, the workaround for the HostPath issue with Agent bit us for the first time.

@pebrc
Copy link
Collaborator

pebrc commented Mar 22, 2021

This is still happening, with the same symptoms as above. My earlier assumption that these failures are related to the incorrect HostPath mount was not correct, even if the container writes its runtime state into the container filesystem the test should succeed.

@pebrc
Copy link
Collaborator

pebrc commented Mar 22, 2021

I disabled the test, please note that the issue is not fixed, even though we won't see the error for now.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 29, 2021

Quick test: it also fails with the 7.12.0.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 30, 2021

I forgot to test 7.11.1. It works.
So, recap:

  • 7.11.0 ✔️
  • 7.11.1 ✔️
  • 7.11.2 ❌
  • 7.12.0 ❌

That would mean that this could be related to a change introduced between 7.11.1 and v7.11.2.

I suspect:

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 30, 2021

I compared bits after bits our manifest with this one elastic/beats/.../elastic-agent-standalone-daemonset-configmap.yaml and I found out why our manifest no longer works.

It comes from the way we declare the environment variables directly without env. prefix.

${NODE_NAME} is no more correct. It's necessary to use ${env.NODE_NAME}.

Loading configuration now excludes inputs from the go-ucfg variable expansion elastic/beats#24005.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Apr 26, 2021

The Beats team looked into this issue:

The fact that 7.11.x worked with ${NODE_NAME} in standalone mode was a bug. The correct variable syntax is ${env.NODE_NAME} in 7.11.x.

So, let's reopen #4393 to use the correct variable syntax with the env prefix.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Aug 4, 2021

Closed by #4393.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Related to unit/integration/e2e tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants