-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TaskManager health status error/warning reasons more visible #154045
Conversation
@@ -41,16 +41,30 @@ describe('logHealthMetrics', () => { | |||
const { calculateHealthStatus } = jest.requireMock('./calculate_health_status'); | |||
|
|||
// We must change from OK to Warning | |||
(calculateHealthStatus as jest.Mock<HealthStatus>).mockImplementation(() => HealthStatus.OK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These mocks were no-op, never stubbed the real function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we changed anything though - except to change the return type to an object and not just the status. The parenthesized function invocations to mockImplementation()
definitely look a little weird, but was there some kind of problem with them? They look like they could work.
But they do look confusing! :-). I wonder if there's a cleanup we can do here - in another issue/PR ...
Pinging @elastic/response-ops (Team:ResponseOps) |
\o/Hacked a "bad reason" message into the code ( {
"status": {
"...": "...",
"plugins": {
"...": "...",
"taskManager": {
"level": "degraded",
"summary": "Task Manager is unhealthy - Reason: because I said so!"
}
}
}
}
|
Sorry, I forgot to ping you Patrick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugins/task_manager/server/lib/calculate_health_status.ts
Outdated
Show resolved
Hide resolved
`Health Status error threshold has been exceeded, resultFrequencySummary.Failed (${resultFrequencySummary.Failed}) is greater than error_threshold (${executionErrorThreshold.error_threshold})` | ||
); | ||
} else { | ||
logger.debug( | ||
logger.warn( | ||
`Health Status warn threshold has been exceeded, resultFrequencySummary.Failed (${resultFrequencySummary.Failed}) is greater than warn_threshold (${executionErrorThreshold.warn_threshold})` | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that we don't return a negative HealthStatus here. But feels like it makes sense, since the task errors could very likely be caused by the user somehow, and not systemic (invalid parameters used in a rule / connector) - we shouldn't set the TM plugin status based on that. But I bet we used to :-)
@@ -41,16 +41,30 @@ describe('logHealthMetrics', () => { | |||
const { calculateHealthStatus } = jest.requireMock('./calculate_health_status'); | |||
|
|||
// We must change from OK to Warning | |||
(calculateHealthStatus as jest.Mock<HealthStatus>).mockImplementation(() => HealthStatus.OK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we changed anything though - except to change the return type to an object and not just the status. The parenthesized function invocations to mockImplementation()
definitely look a little weird, but was there some kind of problem with them? They look like they could work.
But they do look confusing! :-). I wonder if there's a cleanup we can do here - in another issue/PR ...
@@ -375,24 +375,24 @@ describe('Task Run Statistics', () => { | |||
{ Success: 40, RetryScheduled: 40, Failed: 20, status: 'OK' }, | |||
]); | |||
|
|||
expect(logger.debug).toHaveBeenCalledTimes(5); | |||
expect(logger.debug).toHaveBeenNthCalledWith( | |||
expect(logger.warn).toHaveBeenCalledTimes(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm remembering now that these MIGHT have been warns before, but they were SDH generators so we changed them to debug. But even if so, seems fair to me!
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ersin-erdal |
…astic#154045) Resolves: elastic#152289 With this PR we make some of the `debug` logs `warn` and return the message to the health API as reason to add in the status summary message.
…astic#154045) Resolves: elastic#152289 With this PR we make some of the `debug` logs `warn` and return the message to the health API as reason to add in the status summary message.
…astic#154045) Resolves: elastic#152289 With this PR we make some of the `debug` logs `warn` and return the message to the health API as reason to add in the status summary message.
resolves elastic#156112 Change task manager logging on status errors from `warn` to `debug. Making this change as we recently changed from `debug` to `warn` in elastic#154045 . But this ended up too noisy, especially at Kibana startup.
…lastic#157762) resolves elastic#156112 Change task manager logging on status errors from `warn` to `debug. Making this change as we recently changed from `debug` to `warn` in elastic#154045 . But this ended up too noisy, especially at Kibana startup. (cherry picked from commit b542862)
…anges (#157762) (#157800) # Backport This will backport the following commits from `main` to `8.8`: - [[ResponseOps][Task Manager] stop spamming the logs on status changes (#157762)](#157762) <!--- Backport version: 8.9.7 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Patrick Mueller","email":"patrick.mueller@elastic.co"},"sourceCommit":{"committedDate":"2023-05-15T20:21:16Z","message":"[ResponseOps][Task Manager] stop spamming the logs on status changes (#157762)\n\nresolves https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task manager logging on status errors from `warn` to `debug.\r\nMaking this change as we recently changed from `debug` to `warn` in\r\nhttps://github.com//pull/154045 . But this ended up too\r\nnoisy, especially at Kibana startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0","branchLabelMapping":{"^v8.9.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task Manager","Team:ResponseOps","v8.8.0","v8.9.0"],"number":157762,"url":"https://github.com/elastic/kibana/pull/157762","mergeCommit":{"message":"[ResponseOps][Task Manager] stop spamming the logs on status changes (#157762)\n\nresolves https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task manager logging on status errors from `warn` to `debug.\r\nMaking this change as we recently changed from `debug` to `warn` in\r\nhttps://github.com//pull/154045 . But this ended up too\r\nnoisy, especially at Kibana startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0"}},"sourceBranch":"main","suggestedTargetBranches":["8.8"],"targetPullRequestStates":[{"branch":"8.8","label":"v8.8.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.9.0","labelRegex":"^v8.9.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/157762","number":157762,"mergeCommit":{"message":"[ResponseOps][Task Manager] stop spamming the logs on status changes (#157762)\n\nresolves https://github.com/elastic/kibana/issues/156112\r\n\r\nChange task manager logging on status errors from `warn` to `debug.\r\nMaking this change as we recently changed from `debug` to `warn` in\r\nhttps://github.com//pull/154045 . But this ended up too\r\nnoisy, especially at Kibana startup.","sha":"b542862904073982a00ecd7418cc77dbe567b2d0"}}]}] BACKPORT--> Co-authored-by: Patrick Mueller <patrick.mueller@elastic.co>
Resolves: #152289
With this PR we make some of the
debug
logswarn
and return the message to the health API as reason to add in the status summary message.