Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable resource leak tests, change timeout, improve message #4454

Merged

Conversation

fearful-symmetry
Copy link
Contributor

@fearful-symmetry fearful-symmetry commented Mar 21, 2024

What does this PR do?

Attempt to fix #4447.

This increases the timeout on the health check, and improves the error message. As to why the test is flaky, my best guess right now is that the 3-minute timeout just isn't long enough on windows. Plan is just to re-run the test a bunch of times and...see what happens.

@fearful-symmetry fearful-symmetry added flaky-test Unstable or unreliable test cases. skip-changelog labels Mar 21, 2024
@fearful-symmetry fearful-symmetry self-assigned this Mar 21, 2024
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner March 21, 2024 15:23
Copy link
Contributor

mergify bot commented Mar 21, 2024

This pull request does not have a backport label. Could you fix it @fearful-symmetry? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Mar 21, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert pierrehilbert requested review from rdner and removed request for blakerouse March 21, 2024 16:41
@cmacknz
Copy link
Member

cmacknz commented Mar 21, 2024

In case you are missing the discussion in Slack, there was a change in 8.14.0-SNAPSHOT in Fleet that changed the output name from default that is probably what is causing this failure and the one in #4082

Copy link

Quality Gate passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No Coverage information No data about Coverage
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

juliaElastic added a commit to elastic/kibana that referenced this pull request Mar 22, 2024
…arch type (#179218)

## Summary

Related to #178857 and
#177927

It seems that using output id instead of "default" in full agent policy
had a higher impact than expected. There are a few places where agent
relies on the name "default".
([This](elastic/elastic-agent#4454) and
[this](elastic/elastic-agent#4453) pr)
Because of this, doing a partial revert, to keep using "default" for
elasticsearch output type to avoid breaking change.
However, for other types, using the output id. This will fix the
original issue of remote output health reporting.
I think it is a rarely used feature to use a non-elasticsearch output as
default, so it shouldn't have a big impact to not use "default" output
name for those.

To verify:
- create a remote es output and set as default (both data and
monitoring)
- create an agent policy that uses default output 
- enroll an agent
- expect that the agent sends system and elastic-agent metrics/logs to
remote es
- verify that the remote es health badge shows up on UI
- set elasticsearch output back as default
- verify that the agent policy has it as "default" in outputs section

<img width="704" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5">
<img width="1251" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78">
<img width="1791" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd">

<img width="715" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83">



### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
allHealthy = false
}
}
return allHealthy && foundApache && foundSystem
}, runner.healthCheckTime, runner.healthCheckRefreshTime, "install never became healthy")
}, runner.healthCheckTime, runner.healthCheckRefreshTime, "install never became healthy: components did not return a healthy state: %s", compDebugName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: printing out the entire last agent status (in a structured way) may be more useful for debugging

@fearful-symmetry fearful-symmetry merged commit 02c4118 into elastic:main Mar 22, 2024
9 checks passed
juliaElastic added a commit to juliaElastic/kibana that referenced this pull request Mar 26, 2024
…arch type (elastic#179218)

## Summary

Related to elastic#178857 and
elastic#177927

It seems that using output id instead of "default" in full agent policy
had a higher impact than expected. There are a few places where agent
relies on the name "default".
([This](elastic/elastic-agent#4454) and
[this](elastic/elastic-agent#4453) pr)
Because of this, doing a partial revert, to keep using "default" for
elasticsearch output type to avoid breaking change.
However, for other types, using the output id. This will fix the
original issue of remote output health reporting.
I think it is a rarely used feature to use a non-elasticsearch output as
default, so it shouldn't have a big impact to not use "default" output
name for those.

To verify:
- create a remote es output and set as default (both data and
monitoring)
- create an agent policy that uses default output 
- enroll an agent
- expect that the agent sends system and elastic-agent metrics/logs to
remote es
- verify that the remote es health badge shows up on UI
- set elasticsearch output back as default
- verify that the agent policy has it as "default" in outputs section

<img width="704" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/ab46b00d-efc2-49e1-ad7f-9acd44b2a9e5">
<img width="1251" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/a07c0d78-9126-43d9-bd0e-a4df193d7e78">
<img width="1791" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/868a054b-2cae-42f3-8f60-f2bff3b29efd">

<img width="715" alt="image"
src="https://github.com/elastic/kibana/assets/90178898/721cd809-5f97-47e5-bf99-19f542d8ff83">



### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip flaky-test Unstable or unreliable test cases. skip-changelog Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flaky Test]: TestLongRunningAgentForLeaks/TestHandleLeak – Condition never satisfied
6 participants