-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hook up discovery service to Task Manager health #194113
Hook up discovery service to Task Manager health #194113
Conversation
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
… of github.com:mikecote/kibana into task-manager/hook-up-discovery-service-to-health-api-2
PR that will deploy to Cloud: #194289 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified it works as expected using a cloud deployment.
@@ -237,7 +237,6 @@ export default function ({ getService }: FtrProviderContext) { | |||
expect(typeof workload.overdue).to.eql('number'); | |||
|
|||
expect(typeof workload.non_recurring).to.eql('number'); | |||
expect(typeof workload.owner_ids).to.eql('number'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should still be a valid assertion right? we're not removing it from the health report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The owner_ids
value got removed from the health report when I removed the aggregation in x-pack/plugins/task_manager/server/monitoring/workload_statistics.ts
. I figured it was no longer worth it given it always returns 0 as a value. I think that's ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha 👍
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @mikecote |
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11142828625 |
Resolves elastic#192568 In this PR, I'm solving the issue where the task manager health API is unable to determine how many Kibana nodes are running. I'm doing so by leveraging the Kibana discovery service to get a count instead of calculating it based on an aggregation on the `.kibana_task_manager` index where we count the unique number of `ownerId`, which requires tasks to be running and a sufficient distribution across the Kibana nodes to determine the number properly. Note: This will only work when mget is the task claim strategy ## To verify 1. Set `xpack.task_manager.claim_strategy: mget` in kibana.yml 2. Startup the PR locally with Elasticsearch and Kibana running 3. Navigate to the `/api/task_manager/_health` route and confirm `observed_kibana_instances` is `1` 4. Apply the following code and restart Kibana ``` diff --git a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts index 090847032bf..69dfb6d1b36 100644 --- a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts +++ b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts @@ -59,6 +59,7 @@ export class KibanaDiscoveryService { const lastSeen = lastSeenDate.toISOString(); try { await this.upsertCurrentNode({ id: this.currentNode, lastSeen }); + await this.upsertCurrentNode({ id: `${this.currentNode}-2`, lastSeen }); if (!this.started) { this.logger.info('Kibana Discovery Service has been started'); this.started = true; ``` 5. Navigate to the `/api/task_manager/_health` route and confirm `observed_kibana_instances` is `2` --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> (cherry picked from commit d0d2032)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…4685) # Backport This will backport the following commits from `main` to `8.x`: - [Hook up discovery service to Task Manager health (#194113)](#194113) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mike Côté","email":"mikecote@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-10-02T11:19:06Z","message":"Hook up discovery service to Task Manager health (#194113)\n\nResolves https://github.com/elastic/kibana/issues/192568\r\n\r\nIn this PR, I'm solving the issue where the task manager health API is\r\nunable to determine how many Kibana nodes are running. I'm doing so by\r\nleveraging the Kibana discovery service to get a count instead of\r\ncalculating it based on an aggregation on the `.kibana_task_manager`\r\nindex where we count the unique number of `ownerId`, which requires\r\ntasks to be running and a sufficient distribution across the Kibana\r\nnodes to determine the number properly.\r\n\r\nNote: This will only work when mget is the task claim strategy\r\n\r\n## To verify\r\n1. Set `xpack.task_manager.claim_strategy: mget` in kibana.yml\r\n2. Startup the PR locally with Elasticsearch and Kibana running\r\n3. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `1`\r\n4. Apply the following code and restart Kibana\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\nindex 090847032bf..69dfb6d1b36 100644\r\n--- a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n+++ b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n@@ -59,6 +59,7 @@ export class KibanaDiscoveryService {\r\n const lastSeen = lastSeenDate.toISOString();\r\n try {\r\n await this.upsertCurrentNode({ id: this.currentNode, lastSeen });\r\n+ await this.upsertCurrentNode({ id: `${this.currentNode}-2`, lastSeen });\r\n if (!this.started) {\r\n this.logger.info('Kibana Discovery Service has been started');\r\n this.started = true;\r\n```\r\n5. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `2`\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>","sha":"d0d2032f18a37e4c458a26d92092665453b737b0","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Feature:Task Manager","Team:ResponseOps","v9.0.0","backport:prev-minor","ci:cloud-deploy","v8.16.0"],"title":"Hook up discovery service to Task Manager health","number":194113,"url":"https://github.com/elastic/kibana/pull/194113","mergeCommit":{"message":"Hook up discovery service to Task Manager health (#194113)\n\nResolves https://github.com/elastic/kibana/issues/192568\r\n\r\nIn this PR, I'm solving the issue where the task manager health API is\r\nunable to determine how many Kibana nodes are running. I'm doing so by\r\nleveraging the Kibana discovery service to get a count instead of\r\ncalculating it based on an aggregation on the `.kibana_task_manager`\r\nindex where we count the unique number of `ownerId`, which requires\r\ntasks to be running and a sufficient distribution across the Kibana\r\nnodes to determine the number properly.\r\n\r\nNote: This will only work when mget is the task claim strategy\r\n\r\n## To verify\r\n1. Set `xpack.task_manager.claim_strategy: mget` in kibana.yml\r\n2. Startup the PR locally with Elasticsearch and Kibana running\r\n3. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `1`\r\n4. Apply the following code and restart Kibana\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\nindex 090847032bf..69dfb6d1b36 100644\r\n--- a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n+++ b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n@@ -59,6 +59,7 @@ export class KibanaDiscoveryService {\r\n const lastSeen = lastSeenDate.toISOString();\r\n try {\r\n await this.upsertCurrentNode({ id: this.currentNode, lastSeen });\r\n+ await this.upsertCurrentNode({ id: `${this.currentNode}-2`, lastSeen });\r\n if (!this.started) {\r\n this.logger.info('Kibana Discovery Service has been started');\r\n this.started = true;\r\n```\r\n5. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `2`\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>","sha":"d0d2032f18a37e4c458a26d92092665453b737b0"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/194113","number":194113,"mergeCommit":{"message":"Hook up discovery service to Task Manager health (#194113)\n\nResolves https://github.com/elastic/kibana/issues/192568\r\n\r\nIn this PR, I'm solving the issue where the task manager health API is\r\nunable to determine how many Kibana nodes are running. I'm doing so by\r\nleveraging the Kibana discovery service to get a count instead of\r\ncalculating it based on an aggregation on the `.kibana_task_manager`\r\nindex where we count the unique number of `ownerId`, which requires\r\ntasks to be running and a sufficient distribution across the Kibana\r\nnodes to determine the number properly.\r\n\r\nNote: This will only work when mget is the task claim strategy\r\n\r\n## To verify\r\n1. Set `xpack.task_manager.claim_strategy: mget` in kibana.yml\r\n2. Startup the PR locally with Elasticsearch and Kibana running\r\n3. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `1`\r\n4. Apply the following code and restart Kibana\r\n```\r\ndiff --git a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\nindex 090847032bf..69dfb6d1b36 100644\r\n--- a/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n+++ b/x-pack/plugins/task_manager/server/kibana_discovery_service/kibana_discovery_service.ts\r\n@@ -59,6 +59,7 @@ export class KibanaDiscoveryService {\r\n const lastSeen = lastSeenDate.toISOString();\r\n try {\r\n await this.upsertCurrentNode({ id: this.currentNode, lastSeen });\r\n+ await this.upsertCurrentNode({ id: `${this.currentNode}-2`, lastSeen });\r\n if (!this.started) {\r\n this.logger.info('Kibana Discovery Service has been started');\r\n this.started = true;\r\n```\r\n5. Navigate to the `/api/task_manager/_health` route and confirm\r\n`observed_kibana_instances` is `2`\r\n\r\n---------\r\n\r\nCo-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>","sha":"d0d2032f18a37e4c458a26d92092665453b737b0"}},{"branch":"8.x","label":"v8.16.0","branchLabelMappingKey":"^v8.16.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT--> Co-authored-by: Mike Côté <mikecote@users.noreply.github.com>
Resolves #192568
In this PR, I'm solving the issue where the task manager health API is unable to determine how many Kibana nodes are running. I'm doing so by leveraging the Kibana discovery service to get a count instead of calculating it based on an aggregation on the
.kibana_task_manager
index where we count the unique number ofownerId
, which requires tasks to be running and a sufficient distribution across the Kibana nodes to determine the number properly.Note: This will only work when mget is the task claim strategy
To verify
xpack.task_manager.claim_strategy: mget
in kibana.yml/api/task_manager/_health
route and confirmobserved_kibana_instances
is1
/api/task_manager/_health
route and confirmobserved_kibana_instances
is2