Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra UI Meta Issue] Improve UI Field Selection for Metricbeat #40277

Closed
simianhacker opened this issue Apr 5, 2019 · 10 comments
Closed

[Infra UI Meta Issue] Improve UI Field Selection for Metricbeat #40277

simianhacker opened this issue Apr 5, 2019 · 10 comments
Labels
discuss enhancement New value added to drive a business result Feature:Metrics UI Metrics UI feature

Comments

@simianhacker
Copy link
Member

simianhacker commented Apr 5, 2019

Problem

The fields returned from the _field_cap API for Metricbeat indices includes over 2000 fields since every possible field is present in the index mapping. The current approach in the UI, is to present the user with a "combo box" that allows them to narrow down the list by searching. This requires the user have intimate knowledge of the Metricbeat fields. There is not an Elasticsearch native way to filter down this list to only include fields with actual data.

Possible Solutions

  • Create an aggregation that paginates through all the fields (100 at a time) and calculate the cardinality of each field for the time range the user is viewing. This is very costly and could take several seconds to complete.
  • Create parallel count requests (100 at a time) to check if the field exists in the current time range. Initial attempts have also proven to be expensive as well.
  • Filter the list of fields using event.dataset or metricset.module as a required prefix. This would require an aggregation to be run on the data but the potential cardinality of metricset.module is relatively low. We would also need to keep a whitelist of prefixes for things like host and cloud. The down side to this is any field we don't recognize for as an "official" prefix would be filtered out; this would apply to user defined fields.

Related Issues

https://github.com/elastic/dev/issues/1223
#36843
#38020
#39613
#40120
#41090

@ruflin
Copy link
Member

ruflin commented Apr 8, 2019

I filed this issue #24709 in Kibana some time ago which I think is related. It seems Kibana already has the capabilities for option 2 today which I think would be right approach.

@weltenwort
Copy link
Member

weltenwort commented Apr 8, 2019

A few possibilities come to mind, which are mostly independent:

Grouping: Maybe we could find a good middle ground by partitioning the shown list of fields into multiple sections? There could be a "recommended" or "commonly used" section at the top and "everything else" in a second section at the end. The partitioning could go even further by grouping the fields into sections that map to the ECS fields sets (base, agent, network, geo, etc).

Async cardinality: We could also asynchronously calculate the cardinality of the fields when the user opens the menu. That allows them to immediately select something if they know what they were looking for, or to wait for additional details to load. On the other hand, the incremental addition/re-sorting might be confusing and it still causes some load on the cluster.

Batch cardinality calculation: There is task manager available in Kibana, that can be used to perform coordinated batch operations. We could just pre-calculate the cardinalities every N minutes and store the results in a saved object for constant-time retrieval. (That could also be useful for many other applications, so it might make for an interesting shared service.)

@roncohen
Copy link
Contributor

Unless we find a way to query for the relevant metrics that returns quickly, I think it needs to happen in the background, along the lines of "Batch cardinality calculation".

An option is to investigate if we can use the new data frames transformations: https://www.elastic.co/guide/en/elasticsearch/reference/master/preview-data-frame-transform.html

Another option is that metricbeat simply emits a document for each metric it has collected the last minute.

{"@timestamp": ..., “metric”: {“name”: “system.cpu.idle.pct”}.

We could eventually roll this up, so you'd have one document per metric per day. Admittedly, it's not great.

@simianhacker
Copy link
Member Author

Here is my solution to this problem: #36843

@ruflin
Copy link
Member

ruflin commented May 22, 2019

@simianhacker This is great. Is this similar to what the "left bar" does discussed in #24709 ?

@roncohen
Copy link
Contributor

Let’s discuss on the PR?

@roncohen
Copy link
Contributor

@bleskes suggested we discuss if we can have the Field Stats API (doc_count) back, as described here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-field-stats.html#_field_statistics_2

@ruflin
Copy link
Member

ruflin commented May 23, 2019

Would be nice to get something similar back in ES directly. In the past Field Stats supported some constraints on indices. It would be nice to have this constraint instead on date range especially as we are using ILM which means one index can contain the data up to a month. If we only look for 1 day of data we should only see the fields used during this day.

@wylieconlon
Copy link
Contributor

This effort overlaps with the discussions we've been having about improving the usefulness of index patterns. The index pattern service is the natural place to store this extra information, instead of calculating it on every load. The service also lets us share it across all Kibana apps. Improvements to the index pattern service are being discussed here: #35481

@roncohen
Copy link
Contributor

this problem also applies to the second drop down: "graph by". We should only show the fields relevant for the metrics you've selected.

For the "graph per" dropdown, when a user has already selected a metric, we could query for X number of documents that have those metrics and only show the labels/keyword-fields available in those documents? The reason it works here as opposed to in the main metrics selector, is that for the same metric, there will be a much smaller variability in the labels/keyword-fields than if you look at the general population.

@simianhacker simianhacker transferred this issue from another repository Jul 3, 2019
@simianhacker simianhacker transferred this issue from another repository Jul 3, 2019
@simianhacker simianhacker changed the title Improve UI Field Selection for Metricbeat [Infra UI] Improve UI Field Selection for Metricbeat Jul 3, 2019
@simianhacker simianhacker added Feature:Metrics UI Metrics UI feature discuss enhancement New value added to drive a business result labels Jul 3, 2019
@simianhacker simianhacker changed the title [Infra UI] Improve UI Field Selection for Metricbeat [Infra UI][Meta] Improve UI Field Selection for Metricbeat Jul 23, 2019
@simianhacker simianhacker changed the title [Infra UI][Meta] Improve UI Field Selection for Metricbeat [Infra UI Meta Issue] Improve UI Field Selection for Metricbeat Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Feature:Metrics UI Metrics UI feature
Projects
None yet
Development

No branches or pull requests

6 participants