Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics UI] Metric threshold rule type: fix group by + 0 check #111772

Closed
jasonrhodes opened this issue Sep 9, 2021 · 1 comment · Fixed by #111465
Closed

[Metrics UI] Metric threshold rule type: fix group by + 0 check #111772

jasonrhodes opened this issue Sep 9, 2021 · 1 comment · Fixed by #111465
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.1 v7.16.0

Comments

@jasonrhodes
Copy link
Member

Related: #76511

Summary

When a Metric Threshold rule makes a comparison of "less than" / "less than or equal to" along with setting the "Alert per" setting, it creates a broken rule that will miss real alert scenarios. The reason for this is that when the number is actually 0 because there are no documents for a given group, we no longer have that group in the data, so we can't schedule actions for that alert.

AC: We either need to disable the ability to create this kind of alert and document why it's not allowed, or determine a way we can support this functionality in a reliable and performant way.

Notes

One option considered is to store the groups that the rule sees in the persisted rule state and use that state when considering which groups should have alerts triggered. If a previously seen group no longer appears in the data, we can trigger either a 0 doc alert or a "no data" alert, depending on how we choose to handle it. Complications with this approach include:

  • When a group is intentionally removed, the user will need a way to silence further alerts based on that value.
  • How does "no data" relate to a 0 document count query? Should these be the same in ways that the other aggregations aren't the same? (No data is usually not the same as an average of 0, for example.)

POC PR that assumes these are "no data" alerts:

@jasonrhodes jasonrhodes added bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.16.0 v7.15.1 labels Sep 9, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Alerting Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.1 v7.16.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants