Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra Monitoring] Warning about large index sets #120615

Closed
miltonhultgren opened this issue Dec 7, 2021 · 5 comments
Closed

[Infra Monitoring] Warning about large index sets #120615

miltonhultgren opened this issue Dec 7, 2021 · 5 comments
Labels
R&D Research and development ticket (not meant to produce code, but to make a decision) Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Dec 7, 2021

WIP

One of the factors that impact query performance is the number of indices targeted. Elasticsearch can narrow down this set based on if the index has any documents within the specified time range but if you have a large set of indices and they all have documents in the given time range Elasticsearch has to hit all of them, since it cannot know if an index has relevant data or not.
However, the user might know based on for example which filebeat modules they are using.

It would be good if we could surface feedback to the user, saying something along the lines of "the index pattern filebeat-* includes a lot of indices, do you want to narrow it down to filebeat-{module}-*?" or similar.

In the Logs UI we have started adding some kind of validation of the Data View being selected for the Source Configuration, this could be a good area to build further on since today the Rule Type executors are affected by growing sets of indices to scan.

How does this all relate to the can match phase which should account for "presence of fields, value ranges and constant_keywords"

  • roll out the validation to index-name-based configs too
  • always show the checks done (not just when they fail). this could educate the user what to watch out for
  • add more checks for things like index/shard count

Note: This is likely to become more relevant when we can offer "saved views" to give users options to place narrow targets in different views in the same space

@miltonhultgren miltonhultgren added the R&D Research and development ticket (not meant to produce code, but to make a decision) label Dec 7, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Dec 7, 2021
@crob611 crob611 added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Dec 7, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@crob611 crob611 removed the needs-team Issues missing a team label label Dec 7, 2021
@matschaffer
Copy link
Contributor

matschaffer commented Dec 8, 2021

The original intent of https://www.elastic.co/guide/en/elasticsearch/reference/master/keyword.html#constant-keyword-field-type was to allow broad searches (like logs-*) where a majority of the indices could very quickly say "nope not me" and avoid performance problems.

Granted, customer indices might not always these keywords set.

This has me curious about general query timing information for the user. Something like:

This logs view is taking a long time to load. Would you like to "profile the query"(link) to identify what's slowing things down?

It's a much bigger ask than your description of course. But maybe useful food for thought.

@miltonhultgren
Copy link
Contributor Author

@matschaffer That sounds very much like another idea @weltenwort had, about adding some kind of app specific profiling/diagnostics tool

@weltenwort
Copy link
Member

For some background information, we're mostly seeing problems with the alert query execution times. We've been discussing this in #98010 for a while and implemented partial optimizations. The log stream itself has shown pretty stable performance since its queries don't include any expensive aggregations and consistently apply time range filters.

In in light of that, I wonder what we intend to solve with this issue?

@miltonhultgren
Copy link
Contributor Author

Upon some reflection I think this will be solved by us being able to offer curated views where the user can easily narrow down their index set to only show data for that view (and use that for the alerts they create too). If we can get Integrations to install these then it's all the easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R&D Research and development ticket (not meant to produce code, but to make a decision) Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

No branches or pull requests

5 participants