Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CortexKVStoreFailure alert #406

Merged
merged 3 commits into from
Oct 14, 2021
Merged

Added CortexKVStoreFailure alert #406

merged 3 commits into from
Oct 14, 2021

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Oct 13, 2021

What this PR does:
In this PR I propose to add CortexKVStoreFailure  alert. The idea is to get an alert in case a Cortex instance is failing to talk to Consul (eg. Consul is down or there's a network partitioning).

It's a warning alert to get more confidence, but idea is to move it to a critical alert if turnes out to work fine.

Which issue(s) this PR fixes:
N/A

Checklist

  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pracucci pracucci requested a review from a team as a code owner October 13, 2021 09:23
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Should we add similar ones for Etcd? Or preferably similar alert for any KV backend.

@pracucci
Copy link
Collaborator Author

LGTM. Should we add similar ones for Etcd? Or preferably similar alert for any KV backend.

@pstibrany Definitely. I've update the alert to use the generic metric we have for KV store operations. WDYT?

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pracucci pracucci merged commit 567320d into main Oct 14, 2021
@pracucci pracucci deleted the alert-on-consul-failures branch October 14, 2021 08:34
simonswine pushed a commit to grafana/mimir that referenced this pull request Oct 18, 2021
…onsul-failures

Added CortexFailingToTalkToConsul alert
@pracucci pracucci changed the title Added CortexFailingToTalkToConsul alert Added CortexKVStoreFailure alert Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants