Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Display Total Hits and Alerts Created as columns in the Rule Monitoring tab, to make hitting the max_signals circuit breaker evident #120668

Open
andrew-goldstein opened this issue Dec 7, 2021 · 7 comments
Labels
enhancement New value added to drive a business result Feature:Rule Monitoring Security Solution Detection Rule Monitoring sdh-linked Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@andrew-goldstein
Copy link
Contributor

Summary

  • Users are seeking support when detection rules hit the max_signals circuit breaker
  • Users cannot investigate within Kibana why fewer alerts were generated than expected in an interval, because Total Hits and Alerts Created are not displayed in Kibana
  • To investigate, users must edit kibana.yml to enable DEBUG logging in kibana.log
    • Security Analysts and some cloud deployments may not have access to kibana.log (and the option to enable debug logging for this scenario is not publicly documented)
    • As detailed below, correlating multiple log messages to debug the issue is challenging

To address the above, consider displaying Total Hits and Alerts Created as columns in the Rule Monitoring tab shown in the screenshot below:

rule_monitoring

Above: Total Hits and Alerts Created are NOT shown as columns in the Rule Monitoring tab

Details

A user recently reached out for support to help explain why they were consistently seeing fewer alerts generated from a detection rule, where the rule criteria matched more documents than alerts created (in a given interval).

The user's detection rule is likely triggering the max_signals circuit breaker, which defaults to 100 alerts.

Today (7.16), it's not possible for users to understand when and why they hit the circuit breaker (within Kibana).

Specifically, since Total Hits and Alerts Created for a given interval are not displayed in the UI, users must:

  1. Enable debug logging in kibana.yml, (this configuration is different between 7.x and 8.x versions).

Security Analysts may not have access to kibana.log (and even if they do, they would have to intuit that there's a difference between the total hits and alerts created, and then seek support on how to investigate the discrepancy.)

  1. When investigating kibana.log, users find and correlate multiple log messages for a given interval / rule execution to compare Total Hits vs Alerts Created, per the examples below:
[2021-12-07T10:42:16.318-07:00][DEBUG][plugins.securitySolution] totalHits: 247 name: "matches everything" id: "54025240-52d6-11ec-8116-cf5ec75ee311" rule id: "32a4aefa-80fb-4716-bc0f-3f7bb1f14929" space ID: "default"

and

[2021-12-07T10:42:16.505-07:00][DEBUG][plugins.securitySolution] [+] Finished indexing 100 signals into .alerts-security.alerts name: "matches everything" id: "54025240-52d6-11ec-8116-cf5ec75ee311" rule id: "32a4aefa-80fb-4716-bc0f-3f7bb1f14929" space ID: "default" 

to compare (in this example):

  • totalHits: 247
    vs
  • Finished indexing 100 signals into .alerts-security.alerts
@andrew-goldstein andrew-goldstein added triage_needed enhancement New value added to drive a business result Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Alerts Security Detection Alerts Area Team labels Dec 7, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@andrew-goldstein
Copy link
Contributor Author

Also consider exposing Total Hits and Alerts Created in the rule execution log where last 5 failures are displayed

@andrew-goldstein
Copy link
Contributor Author

Given the following:

  • Today (7.16), users must manually page through all activated rules in the Rules Monitoring tab to proactively look for anomalies in the Indexing Time and Query Time columns

  • Adding the (proposed in this enhancement) Total Hits and Alerts Created columns would (implicitly) denote when the max_siginals circuit breaker is triggered but:

    • like the (existing) Indexing Time and Query Time, users must proactively and manually look for anomalies
  • Users taking a proactive approach to looking for anomalies in the existing and proposed columns will only see those anomalies in the Rule Monitoring tab if they occurred in the last run

it may also be helpful to:

  • Display a "rolled up" count of activated rules where Total Hits and Alerts Created were not equal
    • This would eliminate the need to manually examine every activated rule for anomalies
  • Display trends of the metrics displayed on the Rules Monitoring page
    • This would reduce the chances of missing anomalies when they didn't happen on the last run
    • Observing (for example) trends in Query Time at the rule level or overall, would help users understand if and WHEN the system started slowing down
    • Observing trends in rules that trigger the max_signals circuit breaker would help users understand which rules need tuning to reduce false positives

@spong
Copy link
Member

spong commented Dec 8, 2021

Discussed with the @elastic/security-detections-response-rules folks and in addition to writing this info to the rule execution log (allowing support for these columns) it was proposed that we mark the final status of the execution as a warning for additional visibility.

@banderror banderror added Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team and removed triage_needed Team:Detection Alerts Security Detection Alerts Area Team labels Dec 13, 2021
@banderror banderror added the Feature:Rule Monitoring Security Solution Detection Rule Monitoring label Dec 13, 2021
@banderror
Copy link
Contributor

banderror commented Dec 13, 2021

@jethr0null please check this ticket when you have some time. We triaged it today at our area sync and looking for an approval from the product standpoint - if it makes sense, and if yes, are there any details you’d like to change in Andrew’s proposal? From our perspective as engineers it makes a lot of sense to measure those metrics and make available for troubleshooting (both for us as engineers who deal with SDHs and for our users). But this data could be communicated to the user in various forms, it could be columns in the Rule Monitoring table or something else.

@banderror
Copy link
Contributor

TODO @banderror review the implementation of this circuit breaker to better understand the problem to be able to reason about any possible solutions. We'll groom this ticket later.

@spong
Copy link
Member

spong commented Feb 1, 2022

Discussed #124198 with team, holding for 8.2 to further iterate on implementation and UX.

@banderror banderror added the 8.2 candidate considered, but not committed, for 8.2 release label Feb 15, 2022
@banderror banderror added the SecuritySolution:QAAssist Part of QA testing process for release label Mar 10, 2022
@spong spong added 8.3 candidate v8.3.0 and removed v8.2.0 8.2 candidate considered, but not committed, for 8.2 release labels Mar 23, 2022
@banderror banderror removed the SecuritySolution:QAAssist Part of QA testing process for release label Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Rule Monitoring Security Solution Detection Rule Monitoring sdh-linked Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

No branches or pull requests

6 participants