Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Inconsistent handling of gap detection and max signals #100181

Open
Tracked by #165878
marshallmain opened this issue May 14, 2021 · 4 comments · Fixed by #102104
Open
Tracked by #165878
Assignees
Labels
8.16 candidate bug Fixes for quality problems that affect the customer experience Feature:Detection Alerts Security Solution Detection Alerts Feature Feature:Gap Remediation impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team: CTI Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@marshallmain
Copy link
Contributor

marshallmain commented May 14, 2021

The general idea behind gap detection is to run a rule over multiple time ranges in a single alert execution task if too much time has passed since the last rule execution. Each time range the rule searches is generally treated as an independent and complete execution semantically, so each time range can create up to maxSignals signals. Since we limit gap detection to 4 extra rule intervals, this means a rule that hasn't run in a long time could generate up to 5*maxSignals signals. However, we limit each time range to 1*maxSignals so that a single time range can't create all the signals and crowd out signals from other time ranges.

Threat match rules handle this slightly differently. Each slicedChunk of the threat list can contribute up to maxSignals number of signals, and since the chunks are evaluated in parallel it's possible to exceed maxSignals. In addition, since searchAfterAndBulkCreate searches all provided time ranges and sums the created signals into a single counter which the threat match logic then compares to maxSignals (https://github.com/elastic/kibana/blob/master/x-pack/plugins/security_solution/server/lib/detection_engine/signals/threat_mapping/create_threat_signals.ts#L140), it's possible for one time range to hit maxSignals and crowd out signals from the other time ranges.

To address this, we should fully move the gap remediation logic out of searchAfterAndBulkCreate and have the executor functions handle a single time range. The top-level executor code in signal_rule_alert_type.ts would then be responsible for calling the appropriate executor multiple times, once for each time range computed by the gap detection logic. This will likely incur a slight additional performance hit (exact magnitude TBD) for threat match rules when a gap is detected, since the full threat list will now be fetched once per time range instead of once overall in the executor. The benefit is that it prevents signals from one time range from crowding out signals in other time ranges. Extracting this logic also makes it possible to share with other rule types.

To handle the possibility of maxSignals being exceeded, we would have to recombine the parallel searches before building and indexing the signals. As we pull logic out of searchAfterAndBulkCreate we may find that it is simpler to not share the function as a whole between threat match and KQL rules and only share components like bulkCreate and singleSearchAfter.

@marshallmain marshallmain added bug Fixes for quality problems that affect the customer experience Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. labels May 14, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@ecezalp ecezalp self-assigned this May 19, 2021
@MadameSheema MadameSheema added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. v7.14.0 labels May 26, 2021
@madirey madirey self-assigned this Jun 9, 2021
@ecezalp
Copy link
Contributor

ecezalp commented Jul 12, 2021

Repoening as discussed during the RAC sync today - we have established that #102104 didn't address the issue documented in this ticket, and we need to determine whether we want to address the change described in here.

@ecezalp ecezalp reopened this Jul 12, 2021
@ecezalp ecezalp assigned ecezalp and unassigned madirey Jul 12, 2021
@peluja1012 peluja1012 added Team: CTI Team:Detection Alerts Security Detection Alerts Area Team labels Sep 15, 2021
@ecezalp ecezalp added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. and removed impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Sep 28, 2021
@peluja1012 peluja1012 added the Feature:Detection Alerts Security Solution Detection Alerts Feature label Mar 21, 2022
@yctercero yctercero added Team:Detection Engine Security Solution Detection Engine Area and removed Team:Detection Alerts Security Detection Alerts Area Team labels May 13, 2023
@yctercero
Copy link
Contributor

@nkhristinin are we addressing this as part of our manual rule runs work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.16 candidate bug Fixes for quality problems that affect the customer experience Feature:Detection Alerts Security Solution Detection Alerts Feature Feature:Gap Remediation impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team: CTI Team:Detection Engine Security Solution Detection Engine Area Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants