Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Proposal: Change structure of Elastic Agent config #18758

Closed
ruflin opened this issue May 27, 2020 · 11 comments
Closed

[Elastic Agent] Proposal: Change structure of Elastic Agent config #18758

ruflin opened this issue May 27, 2020 · 11 comments
Assignees

Comments

@ruflin
Copy link
Member

ruflin commented May 27, 2020

Details around datasource config

The Agent config today consists of three level: Datasources -> Inputs -> Streams. The data sources are a group of inputs and allow configuration of output or namespace on the level above inputs. Initially we only had a list of inputs in the agent config but we introduced the data source level for a few reasons:

  • Align with the UI
  • Make importing of configs possible
  • Better error reporting to the UI
  • Higher level configs to be configured for multiple inputs like output, namespace, constraints

In the following I want to dive into each point to see if this actually still matters.

Align with the UI

Having to think about the same concept when creating a config manually or through the UI is powerful to not have two different concepts. At the same time, it seems to be ok to have a more convenient way to configure groupings in the UI then on the agent side.

This is especially true as the grouping of inputs is related to packages which are not available on the agent side. If a users configures nginx he needs to manually specify a logs and metrics input anyway and the datasource grouping will not help him much.

Importing configs

Initially the idea was that data sources would make the importing of configs possible to map it to the UI. This works in case the config imported actually matches 1-1 to a package. But if the user specifies his own inputs and groups them together in some way, this will not work anymore. The solution on our end when importing would either be putting all inputs into 1 data source or create 1 data source per input. 1 data source per input would be more likely as otherwise the UI gets more complex (many inputs in 1 data source). With this, I think the import argument is not valid anymore.

Better error reporting

This still holds but I would argue we can solve it differently through metadata on each input. Each input should support additional metadata where we can add names and ids to have a better error reporting. So when an error is reported to Fleet, we know which data sources with inputs and streams inside it belongs to. Already with datasources we would need this as reporting an error just on a data source is not enough. Having this generic meta concept allows also better error reporting in the standalone case.

Higher Level Configs

It is convenient to configure namespace and output on the data source level. We should still allow this on the UI side but is not required on the agent config level. My assumption is that most users to get started use the default namespace and the default output, this means nothing has to be configured.

Users with more complex configs are likely to use some automation to build the configs in which case specifying the output and namespace more often should not be a problem.

Summary

I think on the agent side, the arguments around having the data source object do not hold up anymore.

Proposed new config

Based on the above, I suggest we remove datasources from the agent config but add namespace, output and meta information support on the input level:

inputs:
 - type: system/metrics
   namespace: default
   use_output: default
   meta:
     package.name: bar
     settings.id: foo
     hello: world
   streams:
     - metricset: cpu
       dataset: system.cpu

The part under meta is not understood by the agent but logged/shipped in case of an error.

This new config combined with the proposed changed for the fields used for the indexing strategy (elastic/package-registry#482) solves also the problem that there was no good way to set a different dataset.type for an input. For example the log input could also generate metrics. The config below will send data to metrics-foo-prod:

inputs:
 - type: logs
   dataset.type: metrics
   dataset.namespace: prod

   streams:
     - paths: /var/log/foo.log
       dataset.name: foo

Removing the datasource part also makes getting started easier for manual configuration. The simplest config now looks as following:

inputs:
 - type: logs
   streams:
     - paths: /var/log/foo.log

It is expect that the Ingest Manager in Kibana still has a grouping of inputs available but will flatten it before shipped to the Agent.

@ruflin ruflin self-assigned this May 27, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@mostlyjason
Copy link

mostlyjason commented May 27, 2020

@ruflin Thanks for working to simplify things! I think this is a step in the right direction. There are still a few things that are unclear to me. Perhaps is just my lack of familiarity with how the config works, but it might illustrate the questions others have seeing it the first time.

I see your example has type: system/metrics Can you explain again what the purpose of the system prefix is? Also, I've seen other examples with type nginx/metrics. Is there a reason we don't break these into two separate keys?

I'm also wondering what is the difference between the input type and dataset.type? It'd simpler if one implied the other. Whats the use case having a logs input produce metrics? How does this input type map the child process that the input is applied to? Are you suggesting that the log input type is applied to filebeat, which then produces metrics?

Lastly, in a separate discussion we discussed calling this "Data Kind" Would it make sense to align our name here making it dataset.kind?

@ph
Copy link
Contributor

ph commented May 27, 2020

I see your example has type: system/metrics Can you explain again what the purpose of the system prefix is? Also, I've seen other examples with type nginx/metrics. Is there a reason we don't break these into two separate keys?

The main reason was for disambiguation and try to hide implementation details of the running beats. We could either make it two keys or have map that correlates type to processes.

@EricDavisX
Copy link
Contributor

I love the addition of simplicity.

@blakerouse
Copy link
Contributor

Why use_output instead of just output? Being that an output must always be defined either explicitly or implicitly (using default when not defined), I feel like the use_ prefix seems wierd.

I also agree with @mostlyjason on the system/ prefix. What about moving the type to just system and being that it only does metrics, thats all it does. Using that logic could it be possible to do just nginx then inside of the same input defined both metrics and logs, and if either isn't defined only do the defined ones.

Example:

inputs:
  - type: nginx
     streams:
       - type: logs
         paths: /var/log/nginx/access.log
       - type: metrics
         http: http://localhost/nginx_status

@ruflin
Copy link
Member Author

ruflin commented May 28, 2020

There are two distinct things:

  • The input type to be run: This can be syslog, log, nginx/metrics, S3, system/metrics, redis (for redis slow logs) etc.
  • The type/kind of data it produces: logs, metrics

One can't imply the other as you get logs or metrics from files on S3.

On the input level we have common configuration for all the streams using the same input. For example hosts, username, password so these don't have to be specified for each stream:

inputs:
  - type: logs
    streams:
      - paths: /var/log/nginx/access.log
        dataset.name: nginx.access
      - paths: /var/log/nginx/error.log
        dataset.name: nginx.error
  - type: nginx/metrics
    hosts: https://localhost:8080
    username: foo
    password: bar
    streams:
      - metricset: stubstatus
      - metricset: hello-world
        period: 20s

The input.type references a unique input in one of the processes it runs. Not sure I follow what the benefit would be of two keys as it cannot solve the problem of implying dataset.type from the input.type.

@blakerouse For the use_ prefix no strong opinion but I think the history here is that we wanted to make it clear, you can only reference global inputs and not define one locally. I suggest we take the use_output discussion to an other issue.

@ph
Copy link
Contributor

ph commented May 28, 2020

@blakerouse For the use_ prefix no strong opinion but I think the history here is that we wanted to make it clear, you can only reference global inputs and not define one locally. I suggest we take the use_output discussion to an other issue.

Yes, we did multiple iterations for the output definition.
Initially, we even allowed users to create output inside the input, but with the number of levels in the yaml this was not explicitly clear.

So we move to use_output to clearly differentiate both of them, we can create an issue to discuss it.

@blakerouse
Copy link
Contributor

Yeah let's move the use_output to another issue (or if you already feel it's correct, feel free to ignore me ;-) ). It only triggered my eye because of the _ in the key name, which is the only one that way.

As for my example config, I am just coming in with a new set of eyes and have barely used the current configuration. I am completely missing all back conversations on the current design, which is either a good thing or a bad thing (teaching the new guy), ;-).

Being that when a user uses Fleet they really don't care what the configuration looks like, really the only users that care are those in standalone mode. To me I think it would be great if the configuration could align the same user experience writing configuration as selecting integrations in Fleet. That is why my example places the different types in the streams and the integration name at the input type.

Really with my design you could think that actually the top level key of inputs would be changed to integrations. To be more of the format of:

integrations:
  - integration: nginx
     inputs:
       - type: logs
         paths: /var/log/nginx/access.log
         dataset: access
       - type: logs
         paths: /var/log/nginx/error.log
         dataset: error
       - type: http
         http: http://localhost/nginx_status

I understand it's hidden by metricbeat but nginx/metrics is really just an HTTP request that knows how to process the data that is why I was saying highlight the input type as http.

I think with the current design what draws my eye is:

type: logs vs type: nginx/metrics as it just looks and feels inconsistent.

@ruflin
Copy link
Member Author

ruflin commented May 28, 2020

Currently the config between UI and Config is aligned, that is one of the reasons we have datasources. By design, the agent does not know about integrations, only inputs.

The part you are getting at that nginx/metrics is just a http request is correct and our end goal. I hope that one day the config for nginx/metrics will actually use the type: http instead but were are not there yet.

@ruflin
Copy link
Member Author

ruflin commented Jun 9, 2020

In the meantime I had a conversation with @blakerouse the clarify the above. One thing we discussed was that in some cases getting started with nginx in standalone will become harder then at the moment and we discussed some ideas on how to improve this in the future if we want. I'll leave this open for later.

Working on configs for autodiscovery I realised also not having the datasource part makes autodiscovery simpler (more on that in an other issue).

To get this change in on time for the next release, I suggest we move forward on this. If someone has concerns about the above, please raise them in the next 48h. I filed a meta issue to track all the work needed to get us to the new config. #19082

@ruflin
Copy link
Member Author

ruflin commented Jun 23, 2020

Closing this issue as most of the work in #19082 is done. Further tracking happens there.

@ruflin ruflin closed this as completed Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants