-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/change the initialization of management layer #30694
Conversation
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
This pull request does not have a backport label. Could you fix it @ph? 🙏
NOTE: |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
/package |
The failure looks valid to me, I will take look. |
1592ac9
to
550fc6e
Compare
I've looked at the issues, I am going to rebase this PR and have another go with the test, I've added a changelog too. |
I've tested this PR using one of our vagrant machines (vagrant up ubuntu2004), I've installed the |
@aleksmaus can you take a look at the osquerybeat part? |
60c2de1
to
a244a9f
Compare
@simitt Thanks, I've fixed the typo add added more information for the |
This pull request is now in conflicts. Could you fix it? 🙏
|
* Ensure that libbeat manager is instantiated after the hooks. This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03)
* Ensure that libbeat manager is instantiated after the hooks. This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03)
* Ensure that libbeat manager is instantiated after the hooks. This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03)
It seems I've missed the alert of mergify concerning the conflict. I will make a followup PR. |
it seems, I've fixed that last week , time change are hard. :( |
This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details
* Update to elastic/beats@c52699616a8a * Move Manager.Start() and Manager.Stop() in the beat execution. This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details * Update mock Manager implementation Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co> Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com> Co-authored-by: Andrew Wilkins <axw@elastic.co>
* Ensure that libbeat manager is instantiated after the hooks. This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03) Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com>
This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details
* Ensure that libbeat manager is instantiated after the hooks. This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03)
* Update to elastic/beats@49a7ebdde9ef * Move Manager.Start() and Manager.Stop() in the beat execution. This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details * Update mock Manager implementation Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co> Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com> Co-authored-by: Andrew Wilkins <axw@elastic.co>
This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03) Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…ayer (#30805) * Fix/change the initialization of management layer (#30694) This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 4c14f03) Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com>
This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details
* Update to elastic/beats@6e046b747c6b * Move Manager.Start() and Manager.Stop() in the beat execution. This move the Manager.Start and Stop into the Beats' run method, this move ensure that the system is configured and ready to receive events. Having the Manager started and stopped at the Libbeat level was causing inconsistency when configuring the Beats by the Elastic Agent. The problem would lead to the following behavior: - Zombie Beats with only outputs configured - Beats without any inputs configured - Beats with some of the input configured. The problem was often cause by restarting the agent and having the machine under a significant load. See: elastic/beats#30694 for details * Update mock Manager implementation Co-authored-by: apmmachine <infra-root-apmmachine@elastic.co> Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com> Co-authored-by: Andrew Wilkins <axw@elastic.co>
…astic#30806) This fix an issues on Filebeat that make the start sequence of filebeat non-deterministic. It was possible that not all the hooks were configured correctly before the managed was receiving a configuration from the Elastic Agent. This causes an inconsistency between the expected configuration state and the actual running states, this includes the following symptoms: - Having Filebeat runnings and not sending any data to Elasticsearch - Having Filebeat partially configured, when only some inputs were sending data. - Missing log from the Filebeat collector - Having only metricsbeats running and sending logs. This solves the issues by moving the `Start` and stop `Stop` of the managed into the beats initialization process, each beats need to be adjusted to support. This is indeed a breaking changes for beats author, but the bootstrap process of beats and libbeat cannot easily be extended to make the change into a unique place. (cherry picked from commit 48da76f) Co-authored-by: Pier-Hugues Pellerin <phpellerin@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
)" This reverts commit 48da76f.
This fix an issue on Filebeat that makes the start sequence of Filebeat
non-deterministic. It was possible that not all the hooks were
configured correctly before the managed was receiving a configuration
from the Elastic Agent.
This causes inconsistency between the expected configuration state coming from Agent
and the actual running state. This situation can have one or many of the following symptoms:
sending data.
This solves the issues by moving the
Start
and stopStop
of themanager into the beats initialization process, each beat need to be
adjusted to support this new sequence.
This is indeed a breaking change for beats author,
but the bootstrap process of beats and libbeat cannot easily be
extended to make the change into a unique place.
Every beats has a different code path.
How it was detected
This was detected on a log where log events were actually missing from the log.
Working endpoint.
Problematic endpoint
The later log extract only contains information about the outputs (`Applying settings...) nothing about the inputs.
What does this PR do?
Why is it important?
Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Since the problem is non-deterministic reproducing this issue really hard, I was able to reproduce a few times by having simulated load on agent virtual machine.
Related issues