Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Task Manager] when changing plugin status, indicate the reason in a logged message #152289

Closed
pmuellr opened this issue Feb 27, 2023 · 2 comments · Fixed by #154045
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Feb 27, 2023

We've gotten comments on the task manager health report, and the messages task manager is logging, that it's often unclear why task manager might be "complaining" about something, and what the actual cause is.

As an example, when task manager status goes from green to yellow, there is often no reason given why it it did that. Looking through the task manager diagnostics also doesn't yield obvious clues.

At a minimum, it feels like whenever TM realizes it's going into some warning state, it should log a message indicating why. Today, I believe in many cases, we already do debug log a message - perhaps we could just promote these to warnings.

For example:

function getHealthStatus(logger: Logger, params: GetHealthStatusParams): HealthStatus {
const {
assumedRequiredThroughputPerMinutePerKibana,
assumedAverageRecurringRequiredThroughputPerMinutePerKibana,
capacityPerMinutePerKibana,
} = params;
if (assumedRequiredThroughputPerMinutePerKibana < capacityPerMinutePerKibana) {
return HealthStatus.OK;
}
if (assumedAverageRecurringRequiredThroughputPerMinutePerKibana < capacityPerMinutePerKibana) {
logger.debug(
`setting HealthStatus.Warning because assumedAverageRecurringRequiredThroughputPerMinutePerKibana (${assumedAverageRecurringRequiredThroughputPerMinutePerKibana}) < capacityPerMinutePerKibana (${capacityPerMinutePerKibana})`
);
return HealthStatus.Warning;
}
logger.debug(
`setting HealthStatus.Error because assumedRequiredThroughputPerMinutePerKibana (${assumedRequiredThroughputPerMinutePerKibana}) >= capacityPerMinutePerKibana (${capacityPerMinutePerKibana}) AND assumedAverageRecurringRequiredThroughputPerMinutePerKibana (${assumedAverageRecurringRequiredThroughputPerMinutePerKibana}) >= capacityPerMinutePerKibana (${capacityPerMinutePerKibana})`
);
return HealthStatus.Error;
}

@pmuellr pmuellr added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 27, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@pmuellr
Copy link
Member Author

pmuellr commented Mar 6, 2023

Was noted in triage, that we think the Kibana health APIs allow setting a "reason" or similar as well, so we should do that as well, if possible.

@ersin-erdal ersin-erdal self-assigned this Mar 23, 2023
ersin-erdal added a commit that referenced this issue Apr 18, 2023
…54045)

Resolves: #152289

With this PR we make some of the `debug` logs `warn` and return the
message to the health API as reason to add in the status summary
message.
navarone-feekery pushed a commit to navarone-feekery/kibana that referenced this issue Apr 18, 2023
…astic#154045)

Resolves: elastic#152289

With this PR we make some of the `debug` logs `warn` and return the
message to the health API as reason to add in the status summary
message.
saarikabhasi pushed a commit to saarikabhasi/kibana that referenced this issue Apr 19, 2023
…astic#154045)

Resolves: elastic#152289

With this PR we make some of the `debug` logs `warn` and return the
message to the health API as reason to add in the status summary
message.
nikitaindik pushed a commit to nikitaindik/kibana that referenced this issue Apr 25, 2023
…astic#154045)

Resolves: elastic#152289

With this PR we make some of the `debug` logs `warn` and return the
message to the health API as reason to add in the status summary
message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants