Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC][Rule Registry] Rules that generate over 10K alerts cause an exception in the Kibana logs #122288

Closed
simianhacker opened this issue Jan 4, 2022 · 5 comments · Fixed by #122474
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:RAC label obsolete Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@simianhacker
Copy link
Member

Kibana version:
main

Elasticsearch version:
main

Original install method (e.g. download page, yum, from source, etc.):
source

Describe the bug:
While working on a PR (#121904) to increase the composite size and performance improvements for the Metric Threshold rule type, I stumbled across an exception in the Kibana logs. It looks like there is a query that uses the number of Alerts as the size. When the number of alerts generated is over 10K it throws the following error:

[2021-12-21T15:33:15.396-07:00][ERROR][plugins.alerting] Executing Rule default:metrics.alert.threshold:06e93fa0-62a8-11ec-b34a-6fe767801337 has resulted in Error: search_phase_execution_exception: [illegal_argument_exception] Reason: Result window is too large, from + size must be less than or equal to: [10000] but was [50000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting., caused by: "Result window is too large, from + size must be less than or equal to: [10000] but was [50000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.,Result window is too large, from + size must be less than or equal to: [10000] but was [50000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."

Steps to reproduce:

  1. Follow the "Setup for PR Review" in [Metrics UI] Increase composite size to 10K for Metric Threshold Rule and optimize processing #121904 but set EVENTS_PER_CYCLE to 50000 and PAYLOAD_SIZE to 10000
  2. Set the conditions to trigger an alert
  3. Watch the Kibana logs

Expected behavior:

It should paginate the request by 10K and not throw an exception.

@simianhacker simianhacker added the bug Fixes for quality problems that affect the customer experience label Jan 4, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 4, 2022
@simianhacker simianhacker added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jan 4, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 4, 2022
@gmmorris
Copy link
Contributor

gmmorris commented Jan 6, 2022

Thanks Chris, I can't think of anything in the alerting plugin that would do that (as alerts aren't queryable docs at that level), I suspect Rule Registry is most likely doing this 🤔

We'll triage this and look into it 👍

@marshallmain @mikecote - any thoughts?

@kobelb - wrt the conversation we were having about circuit breakers yesterday, if we find that we have a recurring failure at a certain number by default (in this case 10k), would it make sense to set a circuit breaker at that level sooner? Rather than wait on telemetry?

@mikecote
Copy link
Contributor

mikecote commented Jan 6, 2022

The error might be coming from here:

@simianhacker
Copy link
Member Author

@gmmorris I think @mikecote found the issue (thanks Mike!). I might put a PR together for that today.

@gmmorris
Copy link
Contributor

gmmorris commented Jan 6, 2022

Thanks @simianhacker ! :elasticheart:

@simianhacker simianhacker added Feature:RAC label obsolete and removed Feature:Alerting labels Jan 6, 2022
@simianhacker simianhacker self-assigned this Jan 6, 2022
@simianhacker simianhacker changed the title [Alerting] Rules that generate over 10K alerts cause an exception in the Kibana logs [RAC][Rule Registry] Rules that generate over 10K alerts cause an exception in the Kibana logs Jan 6, 2022
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:RAC label obsolete Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants