Skip to content

Commit

Permalink
[8.7] [ResponseOps] Update flapping logic order to determine whether …
Browse files Browse the repository at this point in the history
…an alert is flapping after it's returned for notification (#151148) (#152507)

# Backport

This will backport the following commits from `main` to `8.7`:
- [[ResponseOps] Update flapping logic order to determine whether an
alert is flapping after it's returned for notification
(#151148)](#151148)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Alexi
Doak","email":"109488926+doakalexi@users.noreply.github.com"},"sourceCommit":{"committedDate":"2023-03-01T20:19:38Z","message":"[ResponseOps]
Update flapping logic order to determine whether an alert is flapping
after it's returned for notification (#151148)\n\nResolves
https://github.com/elastic/kibana/issues/151135\r\n\r\n##
Summary\r\n\r\nMoved the logic around to check flapping after an alert
is already\r\nreturned for notification. I also updated the event log
test cases to\r\nmatch examples in the rfc.\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### To
Verify\r\nThe new test cases capture this example but if you want to
test it\r\nlocally:\r\n- Create a connector and make sure to save the
connector id\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/actions/connector\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"connector_type_id\":
\".server-log\",\r\n \"name\": \"server log\",\r\n \"config\": {},\r\n
\"secrets\": {}\r\n}'\r\n```\r\n- Create the rule and let it
run\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/alerting/rule/\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"rule_type_id\":
\"example.pattern\",\r\n \"name\": \"pattern\",\r\n \"schedule\": {\r\n
\"interval\": \"5s\"\r\n },\r\n \"actions\": [\r\n { \"group\":
\"default\", \"id\": {connector id}, \"params\": { \"message\":
\"{{alert.id}} active on run {{context.runs}} step
{{context.patternIndex}} flapping {{alert.flapping}}\"}},\r\n {
\"group\": \"recovered\", \"id\": {connector id}, \"params\": {
\"message\": \"{{alert.id}} recovered on run flapping
{{alert.flapping}}\"}}\r\n ],\r\n \"consumer\": \"alerts\",\r\n
\"tags\": [],\r\n \"notify_when\": \"onActionGroupChange\",\r\n
\"params\": {\r\n \"patterns\": {\r\n \"instA\": \" a - - a - a - a - a
- - - - - - - - \"\r\n }\r\n }\r\n}'\r\n\r\n```\r\n- Verify that you see
the following, we want to make sure that aren't\r\nmissing any
notifications\r\n\r\n![image](https://user-images.githubusercontent.com/109488926/221615405-48061d10-4b80-4b98-812a-6951da4178da.png)","sha":"5187a6f9aa706a94dbf759b12cc11940698a81b5","branchLabelMapping":{"^v8.8.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:ResponseOps","v8.7.0","v8.8.0"],"number":151148,"url":"https://github.com/elastic/kibana/pull/151148","mergeCommit":{"message":"[ResponseOps]
Update flapping logic order to determine whether an alert is flapping
after it's returned for notification (#151148)\n\nResolves
https://github.com/elastic/kibana/issues/151135\r\n\r\n##
Summary\r\n\r\nMoved the logic around to check flapping after an alert
is already\r\nreturned for notification. I also updated the event log
test cases to\r\nmatch examples in the rfc.\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### To
Verify\r\nThe new test cases capture this example but if you want to
test it\r\nlocally:\r\n- Create a connector and make sure to save the
connector id\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/actions/connector\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"connector_type_id\":
\".server-log\",\r\n \"name\": \"server log\",\r\n \"config\": {},\r\n
\"secrets\": {}\r\n}'\r\n```\r\n- Create the rule and let it
run\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/alerting/rule/\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"rule_type_id\":
\"example.pattern\",\r\n \"name\": \"pattern\",\r\n \"schedule\": {\r\n
\"interval\": \"5s\"\r\n },\r\n \"actions\": [\r\n { \"group\":
\"default\", \"id\": {connector id}, \"params\": { \"message\":
\"{{alert.id}} active on run {{context.runs}} step
{{context.patternIndex}} flapping {{alert.flapping}}\"}},\r\n {
\"group\": \"recovered\", \"id\": {connector id}, \"params\": {
\"message\": \"{{alert.id}} recovered on run flapping
{{alert.flapping}}\"}}\r\n ],\r\n \"consumer\": \"alerts\",\r\n
\"tags\": [],\r\n \"notify_when\": \"onActionGroupChange\",\r\n
\"params\": {\r\n \"patterns\": {\r\n \"instA\": \" a - - a - a - a - a
- - - - - - - - \"\r\n }\r\n }\r\n}'\r\n\r\n```\r\n- Verify that you see
the following, we want to make sure that aren't\r\nmissing any
notifications\r\n\r\n![image](https://user-images.githubusercontent.com/109488926/221615405-48061d10-4b80-4b98-812a-6951da4178da.png)","sha":"5187a6f9aa706a94dbf759b12cc11940698a81b5"}},"sourceBranch":"main","suggestedTargetBranches":["8.7"],"targetPullRequestStates":[{"branch":"8.7","label":"v8.7.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.8.0","labelRegex":"^v8.8.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/151148","number":151148,"mergeCommit":{"message":"[ResponseOps]
Update flapping logic order to determine whether an alert is flapping
after it's returned for notification (#151148)\n\nResolves
https://github.com/elastic/kibana/issues/151135\r\n\r\n##
Summary\r\n\r\nMoved the logic around to check flapping after an alert
is already\r\nreturned for notification. I also updated the event log
test cases to\r\nmatch examples in the rfc.\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n\r\n### To
Verify\r\nThe new test cases capture this example but if you want to
test it\r\nlocally:\r\n- Create a connector and make sure to save the
connector id\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/actions/connector\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"connector_type_id\":
\".server-log\",\r\n \"name\": \"server log\",\r\n \"config\": {},\r\n
\"secrets\": {}\r\n}'\r\n```\r\n- Create the rule and let it
run\r\n```\r\ncurl -X POST -u {username}:{password}
\"http://localhost:5601/api/alerting/rule/\" -H 'kbn-xsrf: true' -H
'Content-Type: application/json' -d'\r\n{\r\n \"rule_type_id\":
\"example.pattern\",\r\n \"name\": \"pattern\",\r\n \"schedule\": {\r\n
\"interval\": \"5s\"\r\n },\r\n \"actions\": [\r\n { \"group\":
\"default\", \"id\": {connector id}, \"params\": { \"message\":
\"{{alert.id}} active on run {{context.runs}} step
{{context.patternIndex}} flapping {{alert.flapping}}\"}},\r\n {
\"group\": \"recovered\", \"id\": {connector id}, \"params\": {
\"message\": \"{{alert.id}} recovered on run flapping
{{alert.flapping}}\"}}\r\n ],\r\n \"consumer\": \"alerts\",\r\n
\"tags\": [],\r\n \"notify_when\": \"onActionGroupChange\",\r\n
\"params\": {\r\n \"patterns\": {\r\n \"instA\": \" a - - a - a - a - a
- - - - - - - - \"\r\n }\r\n }\r\n}'\r\n\r\n```\r\n- Verify that you see
the following, we want to make sure that aren't\r\nmissing any
notifications\r\n\r\n![image](https://user-images.githubusercontent.com/109488926/221615405-48061d10-4b80-4b98-812a-6951da4178da.png)","sha":"5187a6f9aa706a94dbf759b12cc11940698a81b5"}}]}]
BACKPORT-->

Co-authored-by: Alexi Doak <109488926+doakalexi@users.noreply.github.com>
  • Loading branch information
kibanamachine and doakalexi authored Mar 1, 2023
1 parent e4f24cb commit 99c6baf
Show file tree
Hide file tree
Showing 7 changed files with 276 additions and 74 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import { createAlertFactory, getPublicAlertFactory } from '../alert/create_alert
import { Alert } from '../alert/alert';
import { alertingEventLoggerMock } from '../lib/alerting_event_logger/alerting_event_logger.mock';
import { ruleRunMetricsStoreMock } from '../lib/rule_run_metrics_store.mock';
import { getAlertsForNotification, processAlerts, setFlapping } from '../lib';
import { getAlertsForNotification, processAlerts } from '../lib';
import { logAlerts } from '../task_runner/log_alerts';
import { DEFAULT_FLAPPING_SETTINGS } from '../../common/rules_settings';

Expand Down Expand Up @@ -254,19 +254,6 @@ describe('Legacy Alerts Client', () => {
flappingSettings: DEFAULT_FLAPPING_SETTINGS,
});

expect(setFlapping).toHaveBeenCalledWith(
{
enabled: true,
lookBackWindow: 20,
statusChangeThreshold: 4,
},
{
'1': new Alert<AlertInstanceContext, AlertInstanceContext>('1', testAlert1),
'2': new Alert<AlertInstanceContext, AlertInstanceContext>('2', testAlert2),
},
{}
);

expect(getAlertsForNotification).toHaveBeenCalledWith(
{
enabled: true,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,6 @@ export class LegacyAlertsClient<
flappingSettings,
});

setFlapping<State, Context, ActionGroupIds, RecoveryActionGroupId>(
flappingSettings,
processedAlertsActive,
processedAlertsRecovered
);

const { trimmedAlertsRecovered, earlyRecoveredAlerts } = trimRecoveredAlerts(
this.options.logger,
processedAlertsRecovered,
Expand Down Expand Up @@ -213,4 +207,12 @@ export class LegacyAlertsClient<
public getExecutorServices() {
return getPublicAlertFactory(this.alertFactory!);
}

public setFlapping(flappingSettings: RulesSettingsFlappingProperties) {
setFlapping<State, Context, ActionGroupIds, RecoveryActionGroupId>(
flappingSettings,
this.processedAlerts.active,
this.processedAlerts.recovered
);
}
}
2 changes: 2 additions & 0 deletions x-pack/plugins/alerting/server/task_runner/task_runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,8 @@ export class TaskRunner<
}
});

this.legacyAlertsClient.setFlapping(flappingSettings);

let alertsToReturn: Record<string, RawAlertInstance> = {};
let recoveredAlertsToReturn: Record<string, RawAlertInstance> = {};
// Only serialize alerts into task state if we're auto-recovering, otherwise
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1015,7 +1015,7 @@ describe('createLifecycleExecutor', () => {
return { state };
});

await executor(
const serializedAlerts = await executor(
createDefaultAlertExecutorOptions({
alertId: 'TEST_ALERT_0',
params: {},
Expand Down Expand Up @@ -1061,6 +1061,43 @@ describe('createLifecycleExecutor', () => {
})
);

expect(serializedAlerts.state.trackedAlerts).toEqual({
TEST_ALERT_0: {
alertId: 'TEST_ALERT_0',
alertUuid: 'TEST_ALERT_0_UUID',
flapping: true,
flappingHistory: flapping.slice(1).concat([false]),
pendingRecoveredCount: 0,
started: '2020-01-01T12:00:00.000Z',
},
TEST_ALERT_1: {
alertId: 'TEST_ALERT_1',
alertUuid: 'TEST_ALERT_1_UUID',
flapping: false,
flappingHistory: [false, false, false],
pendingRecoveredCount: 0,
started: '2020-01-02T12:00:00.000Z',
},
TEST_ALERT_2: {
alertId: 'TEST_ALERT_2',
alertUuid: 'TEST_ALERT_2_UUID',
flapping: true,
flappingHistory: flapping.slice(1).concat([false]),
pendingRecoveredCount: 0,
started: '2020-01-01T12:00:00.000Z',
},
TEST_ALERT_3: {
alertId: 'TEST_ALERT_3',
alertUuid: 'TEST_ALERT_3_UUID',
flapping: true,
flappingHistory: [false, false, false],
pendingRecoveredCount: 0,
started: '2020-01-02T12:00:00.000Z',
},
});

expect(serializedAlerts.state.trackedAlertsRecovered).toEqual({});

expect((await ruleDataClientMock.getWriter()).bulk).toHaveBeenCalledWith(
expect.objectContaining({
body: [
Expand All @@ -1070,7 +1107,7 @@ describe('createLifecycleExecutor', () => {
[ALERT_INSTANCE_ID]: 'TEST_ALERT_0',
[ALERT_WORKFLOW_STATUS]: 'closed',
[ALERT_STATUS]: ALERT_STATUS_ACTIVE,
[ALERT_FLAPPING]: true,
[ALERT_FLAPPING]: false,
[EVENT_ACTION]: 'active',
[EVENT_KIND]: 'signal',
}),
Expand Down Expand Up @@ -1182,7 +1219,7 @@ describe('createLifecycleExecutor', () => {
return { state };
});

await executor(
const serializedAlerts = await executor(
createDefaultAlertExecutorOptions({
alertId: 'TEST_ALERT_0',
params: {},
Expand Down Expand Up @@ -1228,17 +1265,55 @@ describe('createLifecycleExecutor', () => {
})
);

expect(serializedAlerts.state.trackedAlerts).toEqual({
TEST_ALERT_2: {
alertId: 'TEST_ALERT_2',
alertUuid: 'TEST_ALERT_2_UUID',
flapping: true,
flappingHistory: [true, true, true],
pendingRecoveredCount: 1,
started: '2020-01-02T12:00:00.000Z',
},
});

expect(serializedAlerts.state.trackedAlertsRecovered).toEqual({
TEST_ALERT_0: {
alertId: 'TEST_ALERT_0',
alertUuid: 'TEST_ALERT_0_UUID',
flapping: true,
flappingHistory: [true, true, true, true, true],
pendingRecoveredCount: 0,
started: '2020-01-01T12:00:00.000Z',
},
TEST_ALERT_1: {
alertId: 'TEST_ALERT_1',
alertUuid: 'TEST_ALERT_1_UUID',
flapping: false,
flappingHistory: notFlapping.slice(0, notFlapping.length - 1).concat([true]),
pendingRecoveredCount: 0,
started: '2020-01-02T12:00:00.000Z',
},
TEST_ALERT_3: {
alertId: 'TEST_ALERT_3',
alertUuid: 'TEST_ALERT_3_UUID',
flapping: false,
flappingHistory: notFlapping.slice(0, notFlapping.length - 1).concat([true]),
pendingRecoveredCount: 0,
started: '2020-01-02T12:00:00.000Z',
},
});

expect((await ruleDataClientMock.getWriter()).bulk).toHaveBeenCalledWith(
expect.objectContaining({
body: expect.arrayContaining([
// alert document
{ index: { _id: 'TEST_ALERT_0_UUID' } },
expect.objectContaining({
[ALERT_INSTANCE_ID]: 'TEST_ALERT_0',
[ALERT_STATUS]: ALERT_STATUS_ACTIVE,
[EVENT_ACTION]: 'active',
[ALERT_STATUS]: ALERT_STATUS_RECOVERED,
[EVENT_ACTION]: 'close',
[EVENT_KIND]: 'signal',
[ALERT_FLAPPING]: true,
[ALERT_FLAPPING]: false,
}),
{ index: { _id: 'TEST_ALERT_1_UUID' } },
expect.objectContaining({
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,12 +278,7 @@ export const createLifecycleExecutor =
trackedAlertRecoveredIds
);

const {
alertUuid,
started,
flapping: isCurrentlyFlapping,
pendingRecoveredCount,
} = !isNew
const { alertUuid, started, flapping, pendingRecoveredCount } = !isNew
? state.trackedAlerts[alertId]
: {
alertUuid: lifecycleAlertServices.getAlertUuid(alertId),
Expand All @@ -294,8 +289,6 @@ export const createLifecycleExecutor =
pendingRecoveredCount: 0,
};

const flapping = isFlapping(flappingSettings, flappingHistory, isCurrentlyFlapping);

const event: ParsedTechnicalFields & ParsedExperimentalFields = {
...alertData?.fields,
...commonRuleFields,
Expand Down Expand Up @@ -366,10 +359,11 @@ export const createLifecycleExecutor =
const nextTrackedAlerts = Object.fromEntries(
allEventsToIndex
.filter(({ event }) => event[ALERT_STATUS] !== ALERT_STATUS_RECOVERED)
.map(({ event, flappingHistory, flapping, pendingRecoveredCount }) => {
.map(({ event, flappingHistory, flapping: isCurrentlyFlapping, pendingRecoveredCount }) => {
const alertId = event[ALERT_INSTANCE_ID]!;
const alertUuid = event[ALERT_UUID]!;
const started = new Date(event[ALERT_START]!).toISOString();
const flapping = isFlapping(flappingSettings, flappingHistory, isCurrentlyFlapping);
return [
alertId,
{ alertId, alertUuid, started, flappingHistory, flapping, pendingRecoveredCount },
Expand All @@ -387,10 +381,11 @@ export const createLifecycleExecutor =
event[ALERT_STATUS] === ALERT_STATUS_RECOVERED &&
(flapping || flappingHistory.filter((f: boolean) => f).length > 0)
)
.map(({ event, flappingHistory, flapping, pendingRecoveredCount }) => {
.map(({ event, flappingHistory, flapping: isCurrentlyFlapping, pendingRecoveredCount }) => {
const alertId = event[ALERT_INSTANCE_ID]!;
const alertUuid = event[ALERT_UUID]!;
const started = new Date(event[ALERT_START]!).toISOString();
const flapping = isFlapping(flappingSettings, flappingHistory, isCurrentlyFlapping);
return [
alertId,
{ alertId, alertUuid, started, flappingHistory, flapping, pendingRecoveredCount },
Expand Down
Loading

0 comments on commit 99c6baf

Please sign in to comment.