Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail gracefully if X-Pack monitoring is enabled #6627

Merged
merged 1 commit into from
Apr 10, 2018

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Mar 22, 2018

X-Pack Monitoring feature keeps retrying to connect to Elasticsearch until either Elasticsearch is available and the Monitoring is enabled. Before each time a connection attempted failed an errors was logged. This PR changes the implementation that only an info is printed about not being able to connect on the first attempt. After that reporting will keep retrying to connect to Elasticsearch until it is successful, but no errors are logged. On success an message is logged that the connection was successful.

If for whatever reason X-Pack was available but at a later point isn't anymore, Monitoring will log errors as this is not an expected case.

Note: In Elasticsearch 6.2 Monitoring as an interval parameter for the collection. If it is set to -1 collection is disabled but events will still be written to Elasticsearch. The issue with this in 6.2 is that no cleanup of the sent events happen. In 6.3 there is a collection config option to turn collection off even if monitoring is enabled. If collection is turned off the bulk requests are still accepted and return a 200, but that will not be persisted. The Beat on startup only checks if Monitoring is installed. If collection is enabled or not is a setting internal to monitoring and monitoring is expected to do the right thing with the events here.

This PR has been manually tested against 6.2 but no automated tests were implemented yet.

}{}
err = json.Unmarshal(body, &defaults)
if err != nil {
logp.Debug("monitoring", "Error decoding xpack monitoring defaults from Elasticsearch: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s/xpack monitoring/X-Pack Monitoring/

func (b *Beat) checkXPackMonitoring() bool {

if b.Config.Output.Name() != "elasticsearch" {
logp.Info("Disabling X-Pack monitoring as output is not Elasticsearch.")
Copy link
Contributor

@kvch kvch Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s/monitoring/Monitoring

esConfig := b.Config.Output.Config()
esClient, err := elasticsearch.NewConnectedClient(esConfig)
if err != nil {
logp.Info("Disabling X-Pack monitoring as connection to Elasticsearch could not be established.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s/monitoring/Monitoring/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user will want to know what the error was.

@ruflin ruflin added the in progress Pull request is currently in progress. label Mar 22, 2018
@ruflin
Copy link
Member Author

ruflin commented Mar 22, 2018

I set to in progress again as I haven't managed to test it yet with X-Pack installed and ingestion disabled.

esConfig := b.Config.Output.Config()
esClient, err := elasticsearch.NewConnectedClient(esConfig)
if err != nil {
logp.Info("Disabling X-Pack monitoring as connection to Elasticsearch could not be established.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user will want to know what the error was.

@@ -199,6 +199,7 @@ https://github.com/elastic/beats/compare/v6.0.0-beta2...master[Check the HEAD di
- Add appender support to autodiscover {pull}6469[6469]
- Add add_host_metadata processor {pull}5968[5968]
- Retry configuration to load dashboards if Kibana is not reachable when the beat starts. {pull}6560[6560]
- Enabled X-Pack monitoring by default but fail gracefully.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... if X-Pack Monitoring is not enabled.

@ruflin ruflin added monitoring and removed in progress Pull request is currently in progress. labels Mar 26, 2018
@ruflin
Copy link
Member Author

ruflin commented Mar 26, 2018

@kvch @andrewkroh Review comments applied. But please have a second look. The implementation was simplified a bit was decided not to check the collection part (see updated PR message).

Ready for an other round of reviews.

}{}
err = json.Unmarshal(body, &features)
if err != nil {
logp.Debug("monitoring", "Error fetching monitoring enabled feature from Elasticsearch: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouln't this be logged at error level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was that in all cases where something strange comes back from ES because of Monitoring, not errors are logged. When reading this review I realised there should be an additional error check on line 339 for the error there.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine. I'm just thinking it may belong in a different location.

reporter, err := report.New(b.Info, b.Config.Monitoring, b.Config.Output)
if err != nil {
return err
reportingActive := b.checkXPackMonitoring()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this logic is integral to the Elasticsearch monitoring output. So why not have all of the related code to check if x-pack monitoring is enabled part of the same package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this I realised we already have an almost identical code here: https://github.com/elastic/beats/blob/master/libbeat/monitoring/report/elasticsearch/client.go#L41 I need to revisit the implementation to see how the two can be combined.

@ruflin ruflin added the in progress Pull request is currently in progress. label Mar 27, 2018
X-Pack Monitoring feature keeps retrying to connect to Elasticsearch until either Elasticsearch is available and the Monitoring is enabled. Before each time a connection attempted failed an errors was logged. This PR changes the implementation that only an info is printed about not being able to connect on the first attempt. After that reporting will keep retrying to connect to Elasticsearch until it is successful, but no errors are logged. On success an message is logged that the connection was successful.

If for whatever reason X-Pack was available but at a later point isn't anymore, Monitoring will log errors as this is not an expected case.

Note: In Elasticsearch 6.2 Monitoring as an interval parameter for the collection. If it is set to `-1` collection is disabled but events will still be written to Elasticsearch. The issue with this in 6.2 is that no cleanup of the sent events happen. In 6.3 there is a collection config option to turn collection off even if monitoring is enabled. If collection is turned off the bulk requests are still accepted and return a 200, but that will not be persisted. The Beat on startup only checks if Monitoring is installed. If collection is enabled or not is a setting internal to monitoring and monitoring is expected to do the right thing with the events here.

This PR has been manually tested against 6.2 but no automated tests were implemented yet.
@ruflin
Copy link
Member Author

ruflin commented Apr 9, 2018

@kvch @andrewkroh The PR is completely rewritten. Please have an other look and have a look again through the PR description if that is also the behaviour you expect.

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruflin ruflin removed the in progress Pull request is currently in progress. label Apr 10, 2018
@ruflin
Copy link
Member Author

ruflin commented Apr 10, 2018

@andrewkroh in progress label removed. Can be merged.

@kvch kvch merged commit 80e6f72 into elastic:master Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants