Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage Collection] Collect stats from /_nodes/usage #68603

Closed
TinaHeiligers opened this issue Jun 8, 2020 · 8 comments · Fixed by #70108
Closed

[Usage Collection] Collect stats from /_nodes/usage #68603

TinaHeiligers opened this issue Jun 8, 2020 · 8 comments · Fixed by #70108
Assignees
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.9.0

Comments

@TinaHeiligers
Copy link
Contributor

Summary
As of v7.8, aggregation usage is introduced to the GET /_nodes/usage response and we would like to gain insight from this data. However, at the moment, we don't collect any data from Node Feature Usage stats. We need to add this data to the high level stats that Kibana reports.

@elasticmachine
Copy link
Contributor

Pinging @elastic/pulse (Team:Pulse)

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Jun 10, 2020

@mindbat @afharo
I'm not sure if we came to a consensus on the format of the data retrieved by calling GET /_nodes/usage, specifically if we just want to append the response to cluster_stats.nodes in Kibana's payload to the remote service or as an array suggested in this comment in #340.
It's very early days, but right now cluster_stats.nodes, looks something like this(cropped version):

"nodes" : {
  "jvm" : {},
  "process" : {},
  "os" : {},
  "network_types" : {},
  "versions" : ["8.0.0"],
  "discovery_types" : {},
  "plugins" : [ ],
  "usage" : [
    {
      "rest_actions" : {
        "security_get_role_mappings_action" : 1,
        ...
      },
      "aggregations" : {
        "date_histogram" : {
          "date" : 1
        },
        ...
      },
      "timestamp" : 1591829170055,
      "since" : 1591822675468,
      "node_id" : "mgkTp7CbSS2ZOCg-hd9Igw"
    }
  ]
}

The idea is to have aggregations done on the index layer and send the data (almost) as is.
I'm not sure if we even need to modify the original data either, although the aggregations entry and the rest actions entry per node might be huge. What do you think?

@afharo
Copy link
Member

afharo commented Jun 11, 2020

I agree with Kibana providing the data as granular as possible and the index layer to break it down into aggregations and multiple indices as required by PMs and any users of the info in the Telemetry cluster. But I'd say the last call is for @mindbat to decide.

Also, raising a question for @mindbat. As a golden rule, what is preferred?

  1. The original response from Elasticsearch where usage is an object and keys are the node_id (the index layer will need to break it down by looping through all the keys):
  "usage" : {
     "mgkTp7CbSS2ZOCg-hd9Igw": {
      "rest_actions" : {
        "security_get_role_mappings_action" : 1,
        ...
      },
      "aggregations" : {
        "date_histogram" : {
          "date" : 1
        },
        ...
      },
      "timestamp" : 1591829170055,
      "since" : 1591822675468
    },
    ...
  }
  1. Or the suggested usage array proposed by Tina in her comment.

Personally, I like the latter better since I think it's easier to handle for schema validations and loops (either via a simple .map or an ingest pipeline forEach processor).

@mindbat
Copy link

mindbat commented Jun 16, 2020

As a golden rule, what is preferred?

@afharo Oof, I'm torn.

On the one hand, keeping the data close to what ES sends means when other teams think of this data being sent to telemetry, they have a good idea of what's available (in this case, node-level data).

On the other, if we don't need node-level info, then bundling it up at the collector before sending it on certainly makes it easier to process downstream, as you say.

In the absence of anyone asking for node-level data, I guess I'd lean towards the usage array @TinaHeiligers outlined.

Has anyone asked for node-level data? Do we anticipate folks asking for it later?

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Jun 24, 2020

@mindbat I'm not sure I understand what you mean by "node-level" info. The usage array I propose contains all the same info as the original response where the usage for each node is an entry in the array and the keys restructured to be the value of node_id (intentionally added to remove the need for dynamic property mapping in es)

@TinaHeiligers
Copy link
Contributor Author

TinaHeiligers commented Jun 25, 2020

The stack-monitoring team gives the high-level steps on how to add data to the .monitoring-es-* indices. (more info in the team sync spreadsheet)

@afharo
Copy link
Member

afharo commented Jun 25, 2020

I think, at some point, we should split the difference between Monitoring and Usage Telemetry. Pushing data to the .monitoring-* indices only for telemetry purposes seems to me like a hack.
Maybe we can discuss it on #68998

@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Oct 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.9.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants