Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Rule execution logging overhaul follow-up #124194

Merged

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Feb 1, 2022

Related to: #121644
Addresses: #86202

Summary

Done in this PR:

  • Removed the deprecated warning rule execution status (comment).
  • Added a new running status (ticket).
  • Simplified the internal implementation of the rule_execution_log folder. Hopefully naming of folders, files and interfaces is clearer now as well. (comment, comment)
  • Added APM measurements with withSecuritySpan.
  • Added rule id to the react-query key used for loading last rule failures (comment)
  • Addressed most of the // TODO: https://github.com/elastic/kibana/pull/121644 comments

In the next PR that could be merged after the FF I'd address the rest of the stuff:

  • Add comments to all the interfaces and methods in the rule_execution_log folder. Write a readme for it.
  • Address the remaining of the // TODO: https://github.com/elastic/kibana/pull/121644 comments. All of them are related to tests.
  • Fix for the gap column (comment)

@banderror banderror added refactoring technical debt Improvement of the software architecture and operational architecture release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.1.0 Feature:Rule Monitoring Security Solution Detection Rule Monitoring Team:Detection Rule Management Security Detection Rule Management Team labels Feb 1, 2022
@banderror banderror self-assigned this Feb 1, 2022
@banderror banderror force-pushed the rule-execution-logging-overhaul-follow-up branch 2 times, most recently from 7996a79 to d2f73fc Compare February 1, 2022 16:01
@banderror banderror force-pushed the rule-execution-logging-overhaul-follow-up branch from d2f73fc to 773ee38 Compare February 1, 2022 20:07
@banderror banderror requested a review from a team February 1, 2022 20:20
@banderror banderror marked this pull request as ready for review February 1, 2022 20:20
@banderror banderror requested a review from a team as a code owner February 1, 2022 20:20
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@spong
Copy link
Member

spong commented Feb 1, 2022

Came across an app crash in initial testing when visiting a Rule Details page for an id that has never existed.

And non-minified stacktrace -- looks like we're not short circuiting soon enough and trying to fill in the severityMappings with an empty mapping. Must be an underlying type issue lingering in there?

Just verified this exists on the kibana.siem test cluster, so not an issue introduced here.

Comment on lines +13 to +21
export const getStatusText = (value: RuleExecutionStatus | null | undefined): string | null => {
if (value == null) {
return null;
}
if (value === RuleExecutionStatus['partial failure']) {
return 'warning';
}
return value;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearer name, and easier to immediately grok without the nested ternary (same with getStatusColor), thanks you! 🎉

@@ -109,3 +109,10 @@ export const RELOAD_MISSING_PREPACKAGED_RULES_AND_TIMELINES = (
'Install {missingRules} Elastic prebuilt {missingRules, plural, =1 {rule} other {rules}} and {missingTimelines} Elastic prebuilt {missingTimelines, plural, =1 {timeline} other {timelines}} ',
}
);

export const RULE_EXECUTION_EVENTS_FETCH_FAILURE = i18n.translate(
'xpack.securitySolution.containers.detectionEngine.ruleExecutionEventsFetchFailDescription',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly different keys than in my branch -- I'll be sure to run scripts/i18n_check.js --fix once we merge so there aren't any leftover unused keys. 👍

export const useRuleExecutionEvents = (ruleId: string) => {
const { addError } = useAppToasts();

return useQuery(
'ruleExecutionEvents',
['ruleExecutionEvents', ruleId],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@spong
Copy link
Member

spong commented Feb 1, 2022

In testing saw an issue where the alerts table loader would stay present on Rule Details. This happened after editing a rule schedule and immediately after being returned to Rule Details. Have yet to reproduce, and doesn't appear to be related to this changeset.

@@ -118,7 +118,7 @@ export const createSecurityRuleTypeWrapper: CreateSecurityRuleTypeWrapper =
let wroteWarningStatus = false;

await ruleExecutionLogger.logStatusChange({
newStatus: RuleExecutionStatus['going to run'],
newStatus: RuleExecutionStatus.running,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's even necessary to write a running status change event from security (to event-log at least). By and large the execute-start event is going to represent the same and even contain additional data (schedule datetime, and so schedule delay as well). We would still need to write to the executionSO for latest status on the Rules/Monitoring tables.

Is there some case where going to run captures a unique series of events that execute-start can't be used for?

@spong
Copy link
Member

spong commented Feb 2, 2022

Build is hung up on one failing cypress test (sorting.spec.ts). Was able to repro reliable locally, but appears to be flake as cypress doesn't wait for page 2 to load before fetching the value to compare against, so it ends up being the same value (and fails).

security_solution

Resolved in 28e0892 w/ an additional wait to ensure the table loader is complete before fetching the page 2 value. I added the wait inline to the test, but it looked like it may be a better fit at the end of goToPage:

export const goToPage = (pageNumber: number) => {
cy.get(RULES_TABLE_REFRESH_INDICATOR).should('not.exist');
cy.get(pageSelector(pageNumber)).last().click({ force: true });
};

and return a Promise? @MadameSheema

Copy link
Member

@spong spong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked out locally, desk tested, and code reviewed -- LGTM! 👍 Found a couple potential bugs (commented), but doesn't look to be caused by this changeset.

Thanks for looping back around and addressing all the cleanup here @banderror!

@spong spong enabled auto-merge (squash) February 2, 2022 01:14
@spong spong merged commit 20e8381 into elastic:main Feb 2, 2022
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 4.7MB 4.7MB +356.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Rule Monitoring Security Solution Detection Rule Monitoring refactoring release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SecuritySolution][Detections] Add 'running' Rule status
4 participants