Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slo): global and slo specific diagnosis #153118

Merged
merged 2 commits into from
Mar 14, 2023

Conversation

kdelemme
Copy link
Contributor

@kdelemme kdelemme commented Mar 10, 2023

Summary

Resolves #152503

This PR introduces two new internal routes:

  • GET /internal/observability/slos/_diagnosis
  • GET /internal/observability/slos/{id}/_diagnosis

The first one performs a global diagnosis, while the second one focuses on the specific slo.

Testing

Different scenarios:

  1. Without any SLO created, the global diagnosis returns "NOT_OK" for the slo resources
  2. With at least one SLO created, the global diagnosis returns "OK" for the slo resources
  3. With an SLO, the slo diagnosis returns the transform stats as well as some sample from the index
curl --request GET \
  --url http://localhost:5601/kibana/internal/observability/slos/_diagnosis \
  --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
  --header 'kbn-xsrf: oui'
curl --request GET \
  --url http://localhost:5601/kibana/internal/observability/slos/{SLO_ID}/_diagnosis \
  --header 'Authorization: Basic ZWxhc3RpYzpjaGFuZ2VtZQ==' \
  --header 'kbn-xsrf: oui'
Example from slo diagnosis
{
	"sloResources": {
		"slo-observability.sli": "OK",
		"slo-observability.sli-mappings": "OK",
		"slo-observability.sli-settings": "OK",
		"slo-observability.sli.monthly": "OK"
	},
	"sloSavedObject": {
		"id": "8bc13800-bf53-11ed-87f2-1f84267e6d31",
		"type": "slo",
		"namespaces": [
			"default"
		],
		"updated_at": "2023-03-10T14:55:06.626Z",
		"created_at": "2023-03-10T14:55:06.626Z",
		"version": "WzE5MSwxXQ==",
		"attributes": {
			"name": "latency critical",
			"description": "",
			"indicator": {
				"type": "sli.kql.custom",
				"params": {
					"index": "service-logs-latency",
					"filter": "dataset :\"healthy_then_failing\" and host : \"6b6cae72-768f-446f-a1d2-18d54f067e18\"",
					"good": "latency <= 100",
					"total": ""
				}
			},
			"timeWindow": {
				"duration": "30d",
				"isRolling": true
			},
			"budgetingMethod": "occurrences",
			"objective": {
				"target": 0.98
			},
			"id": "8bc13800-bf53-11ed-87f2-1f84267e6d31",
			"settings": {
				"timestampField": "@timestamp",
				"syncDelay": "1m",
				"frequency": "1m"
			},
			"revision": 1,
			"enabled": true,
			"createdAt": "2023-03-10T14:55:06.624Z",
			"updatedAt": "2023-03-10T14:55:06.624Z"
		},
		"references": [],
		"coreMigrationVersion": "8.0.0"
	},
	"sloTransformStats": {
		"count": 1,
		"transforms": [
			{
				"id": "slo-8bc13800-bf53-11ed-87f2-1f84267e6d31-1",
				"state": "started",
				"node": {
					"id": "MZa-zyzzQEGp7jPFm7czTw",
					"name": "Kevins-MacBook-Pro.local",
					"ephemeral_id": "AKX0b7A2TP67x0Yik8P07w",
					"transport_address": "127.0.0.1:9300",
					"attributes": {}
				},
				"stats": {
					"pages_processed": 664,
					"documents_processed": 520125,
					"documents_indexed": 43345,
					"documents_deleted": 0,
					"trigger_count": 145,
					"index_time_in_ms": 3397,
					"index_total": 231,
					"index_failures": 0,
					"search_time_in_ms": 7033,
					"search_total": 664,
					"search_failures": 0,
					"processing_time_in_ms": 88,
					"processing_total": 664,
					"delete_time_in_ms": 0,
					"exponential_avg_checkpoint_duration_ms": 63.995819400309756,
					"exponential_avg_documents_indexed": 1.0000000121859232,
					"exponential_avg_documents_processed": 11.93250807489107
				},
				"checkpointing": {
					"last": {
						"checkpoint": 145,
						"timestamp_millis": 1678468780920,
						"time_upper_bound_millis": 1678468680000
					},
					"operations_behind": 12,
					"changes_last_detected_at": 1678468780745,
					"last_search_time": 1678468780745
				},
				"health": {
					"status": "green"
				}
			}
		]
	},
	"dataSample": {
		"took": 0,
		"timed_out": false,
		"_shards": {
			"total": 1,
			"successful": 1,
			"skipped": 0,
			"failed": 0
		},
		"hits": {
			"total": {
				"value": 10000,
				"relation": "gte"
			},
			"max_score": null,
			"hits": [
				{
					"_index": "service-logs-latency",
					"_id": "E7qJzIYB-ECiHgMAUMrq",
					"_score": null,
					"_source": {
						"@timestamp": "2023-03-10T17:19:56.000Z",
						"dataset": "95percent_good",
						"host": "52e34a82-4e33-4154-bca8-fa6e76248078",
						"latency": 74
					},
					"sort": [
						1678468796000
					]
				},
				{
					"_index": "service-logs-latency",
					"_id": "FLqJzIYB-ECiHgMAUMr7",
					"_score": null,
					"_source": {
						"@timestamp": "2023-03-10T17:19:56.000Z",
						"dataset": "healthy_then_failing",
						"host": "6b6cae72-768f-446f-a1d2-18d54f067e18",
						"latency": 549
					},
					"sort": [
						1678468796000
					]
				},
				{
					"_index": "service-logs-latency",
					"_id": "FbqJzIYB-ECiHgMAUcoA",
					"_score": null,
					"_source": {
						"@timestamp": "2023-03-10T17:19:56.000Z",
						"dataset": "full_outage_every_day",
						"host": "cb247dc4-e4a0-4ac7-8838-240ed6bcb801",
						"latency": 4
					},
					"sort": [
						1678468796000
					]
				},
				{
					"_index": "service-logs-latency",
					"_id": "ELqJzIYB-ECiHgMAPcpQ",
					"_score": null,
					"_source": {
						"@timestamp": "2023-03-10T17:19:51.000Z",
						"dataset": "95percent_good",
						"host": "52e34a82-4e33-4154-bca8-fa6e76248078",
						"latency": 42
					},
					"sort": [
						1678468791000
					]
				},
				{
					"_index": "service-logs-latency",
					"_id": "EbqJzIYB-ECiHgMAPcpZ",
					"_score": null,
					"_source": {
						"@timestamp": "2023-03-10T17:19:51.000Z",
						"dataset": "healthy_then_failing",
						"host": "6b6cae72-768f-446f-a1d2-18d54f067e18",
						"latency": 579
					},
					"sort": [
						1678468791000
					]
				}
			]
		}
	}
}

@kdelemme kdelemme added release_note:skip Skip the PR/issue when compiling release notes Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" v8.8.0 labels Mar 10, 2023
@kdelemme kdelemme self-assigned this Mar 10, 2023
@kdelemme kdelemme force-pushed the feat/slo-diagnosis branch 2 times, most recently from 51da694 to 332d509 Compare March 10, 2023 17:21
@kdelemme kdelemme changed the title feat(slo): diagnosis feat(slo): global and slo specific diagnosis Mar 10, 2023
@kdelemme kdelemme marked this pull request as ready for review March 10, 2023 17:29
@kdelemme kdelemme requested a review from a team as a code owner March 10, 2023 17:29
@elasticmachine
Copy link
Contributor

Pinging @elastic/actionable-observability (Team: Actionable Observability)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/slo-schema 75 76 +1
Unknown metric groups

API count

id before after diff
@kbn/slo-schema 75 76 +1

ESLint disabled line counts

id before after diff
securitySolution 433 436 +3

Total ESLint disabled count

id before after diff
securitySolution 513 516 +3

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @kdelemme

Copy link
Contributor

@CoenWarmer CoenWarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kdelemme kdelemme merged commit c226c07 into elastic:main Mar 14, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Mar 14, 2023
@kdelemme kdelemme deleted the feat/slo-diagnosis branch March 14, 2023 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SLO] Create diagnosis endpoints
5 participants