[Alerting] Gracefully restore failed rules from pre-7.11 #117593
Labels
Feature:Alerting/RulesFramework
Issues related to the Alerting Rules Framework
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
In this issue we have determined that for rules created prior to 7.11, the associated task manager document does not contain the
schedule
field:Example pre-7.11 rule task doc
When rules are running normally and Kibana is upgraded to 7.11+, after the next normal execution, the task manager doc will be updated with the
schedule
field.However, if a rule has reached its
maxAttempts
value of3
, when Kibana is upgraded to 7.11+, the task managerupdateByQuery
script will mark these rules asfailed
because it has noschedule
and the number of attempts has reached the limit. We want to make sure these rules continue running so we propose to do 2 things to mitigate:schedule
field is missing. Resetattempts
to0
andstatus
toidle
. This should ensure that task manager can start claiming these tasks again.schedule
field. If it does not, update the task document to include it. This should ensure that the alerting rule task will not reach this state again.The text was updated successfully, but these errors were encountered: