Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add option to enable disk queue for elastic-agent-shipper #141508

Closed
10 of 13 tasks
Tracked by #118
leehinman opened this issue Sep 22, 2022 · 32 comments · Fixed by #145755
Closed
10 of 13 tasks
Tracked by #118

[Fleet] Add option to enable disk queue for elastic-agent-shipper #141508

leehinman opened this issue Sep 22, 2022 · 32 comments · Fixed by #145755
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.6.0

Comments

@leehinman
Copy link

leehinman commented Sep 22, 2022

Part of #elastic/elastic-agent-shipper#118

Description

We need to support the new Elastic Agent Shipper in Fleet, that is going to replace the beats options currently configured in settings/outputs through YML editor. The aim is to allow the user to configure the new options from the Settings Output Flyout.

  • Options from add disk queue configuration to shipper configuration should be user configurable. The parameters shown in the UI are going to map those found in the reference configuration file.
    • Note that these are still in preliminary definition and might be changed. They are the same as current beats configs except for use_compression and  encryption_password.
    • The maximum batch size and flush timeout parameters are not yet defined for the shipper, so we're going to have placeholders (maximum_batch_bytes and output_queue_flush_timeout) until they get implemented.
    • Memory queue size for now is going to be Memory queue events. Copy: "Maximum number of events that can be stored in the queue. Default is set to 4096"
    • The encryption password doesn't need to be exposed in the UI as users don't need to be aware of it
  • Show the new options only when the Shipper feature flag is enabled per output (see comment)
    • The options should be visible for both Logstash and ES outputs
  • Implement toggle for disk queue vs memory queue (memory queue is the default). For the shipper the memory queue parameters are defined here.
    • Queue option is persistent between agent upgrade/downgrade
    • If downgrade is to 8.5.x or older, memory queue option is used (as disk queue wasn't available) and the new configuration under the shipper: object should be ignored
  • Disk Queue section should be collapsed when Disk Queue toggle is not enabled. Once enabled, all the options are available to the user to configure.
  • If Disk Queue and Encryption are enabled due to an integration, then Disk Queue and Encryption should be grayed out.
  • If Compression is disabled, do not display the compression levels.
    • Once compression is enabled, the compression levels (1 - 9) should be configurable.
    • All other options are available for configuration
  • Load balancing and compression options are unique to each output type. They'll be always visible in the case of Logstash/ES, however when the Kafka output will be added won't be applicable and will need to be hidden/grayed out
  • Add new shipper section to agent policy. The exact format is still being discussed ( see Namespace config values in fleet output elastic-agent#1729)
  • Write migration for new output format

Demo

  • Prepare recorded demo

Updated Mockups

Screenshot 2022-11-23 at 11 22 58

Screenshot 2022-11-23 at 11 24 21

@leehinman leehinman added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 22, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@cmacknz
Copy link
Member

cmacknz commented Sep 27, 2022

@nimarezainia to follow up with more details here on what should be configurable and make sure we get UI mocks created.

@nimarezainia
Copy link
Contributor

@cmacknz @ @leehinman I discussed this with @dborodyansky and the UI mock up we have in the definition document seems pretty straight forward to use for all the configurations discussed for the new shipper. Higher fidelity examples can be created and we will get someone from docs to look over the copy as there are some text boxes to consider.

@kpollich added this to our design discussions. Let's clarify then.

@criamico criamico self-assigned this Nov 4, 2022
@criamico
Copy link
Contributor

criamico commented Nov 4, 2022

@nimarezainia i'm starting to look at this and I have some questions about the designs.

Comparing the current output settings and this desired UI, it's not completely clear what to do of the YML editor. Should we keep it, since it allows to configure many more features?
If it stays, how should we handle any potential conflicts between the YML settings and the new UI configurations? Which one should override the other?

Screenshot 2022-11-04 at 18 03 15

@nchaulet
Copy link
Member

nchaulet commented Nov 4, 2022

Comparing the current output settings and this desired UI, it's not completely clear what to do of the YML editor. Should we keep it, since it allows to configure many more features?

Yes it's used to configure some ssl configuration, we should probably keep it.

if downgrade is to 8.5 or older memory queue is used (disk queue wasn't available)

The agent policies are not specific for an agent version, does the agent < 8.6 will handle the option and use memory queue if we pass disk queue or it will crash?

@criamico
Copy link
Contributor

criamico commented Nov 4, 2022

Yes it's used to configure some ssl configuration, we should probably keep it.
@nchaulet Thanks for this clarification. We just need to determine if:

  1. we want to give more weight to the YML content hence overriding the UI settings
    OR
  2. we want to discard the content of the YML based on what's in the UI settings.

@cmacknz
Copy link
Member

cmacknz commented Nov 4, 2022

Drive by comment tangentially related to the thread above:

One thing that is not captured in this issue is that the agent functionality that supports the new configurations here will be behind a feature flag that is disabled by default. Specifically, users would need to opt in to using the Elastic agent data shipper for each output they have.

I don't think we would want to present this UI unless a user has enabled the shipper feature flag in the agent policy. Without enabling the shipper these configuration options do nothing, and I don't think we should present them to users if they have no effect. Presenting these options by default would be very confusing, as users could configure a disk queue but the agent would ignore the configuration option.

Ideally we should also have checkbox or similar toggle to opt in to the new elastic agent shipper on a per output basis. Only if that option is selected should these new UI elements be shown. Only once we get out of Beta we would default to having this flag enabled by default and the new queue configuration UI visible default.

Possibly this context aws intended to be provided in #142649 which isn't complete yet. Pinging @nimarezainia for additional input here.

@cmacknz
Copy link
Member

cmacknz commented Nov 4, 2022

One exception to my comment above, the Compression and Loadbalancing options here are always valid, regardless of if the agent is using the data shipper or the existing output architecture.

@criamico
Copy link
Contributor

criamico commented Nov 7, 2022

Thanks @cmacknz, I'll take in account that these features are available under a feature flag.

@kpollich
Copy link
Member

kpollich commented Nov 7, 2022

I'll take a pass at rewriting this issue description to be less of a placeholder today. I will be sure to include the feature flag functionality as a requirement here.

@criamico
Copy link
Contributor

criamico commented Nov 7, 2022

I'm currently gathering more information about this ticket and I'd like to have confirmation that I got them right:

@nimarezainia
Copy link
Contributor

@criamico @cmacknz in elastic/elastic-agent#217 (comment) it was decided that this UI would be exposed only when the user configures the YAML to indicate shipper needs to be enabled.

We want the final UI to be developed so we avoid any breaking change, but hide everything in UI that depends on shipper under a feature flag until we GA. @criamico happy to discuss options here. they can be placed under an advanced section drop down which is greyed out if feature flag is not set.

@criamico
Copy link
Contributor

criamico commented Nov 8, 2022

in elastic/elastic-agent#217 (comment) it was decided that this UI would be exposed only when the user configures the YAML to indicate shipper needs to be enabled.

@nimarezainia Thanks for the clarification, I wasn't aware of that comment.
So, all the configurations in the mockups would need to be shown only when the user has the feature flag enabled AND the shipper is set in the output YAML configuration. Is that correct?

 shipper: {}

@nimarezainia
Copy link
Contributor

This is correct they are pretty much 1:1

I forwarded the requirements document to you that explains these. "Maximum batch size" is not the same as the variable "bulk_max_size" - Maximum batch size should be in bytes, "bulk max size" is in number of events.

@cmacknz & @leehinman could you please direct @criamico here as to what parameters should be configured?

@nimarezainia
Copy link
Contributor

in elastic/elastic-agent#217 (comment) it was decided that this UI would be exposed only when the user configures the YAML to indicate shipper needs to be enabled.

@nimarezainia Thanks for the clarification, I wasn't aware of that comment. So, all the configurations in the mockups would need to be shown only when the user has the feature flag enabled AND the shipper is set in the output YAML configuration. Is that correct?

 shipper: {}

I believe that this is the feature flag. @blakerouse @cmacknz is that correct?

@criamico
Copy link
Contributor

criamico commented Nov 8, 2022

in elastic/elastic-agent#217 (comment) it was decided that this UI would be exposed only when the user configures the YAML to indicate shipper needs to be enabled.

@nimarezainia Thanks for the clarification, I wasn't aware of that comment. So, all the configurations in the mockups would need to be shown only when the user has the feature flag enabled AND the shipper is set in the output YAML configuration. Is that correct?

 shipper: {}

I believe that this is the feature flag.

That is even simpler, thanks for the clarification! I appreciate the help :)

@cmacknz
Copy link
Member

cmacknz commented Nov 8, 2022

The Fleet UI issue is running a bit ahead of the shipper project. We should have produced a one page summary of all of this before hand :)

I believe that this is the feature flag. @blakerouse @cmacknz is that correct?

The PR adding support for the shipper to the agent is elastic/elastic-agent#1527. All shipper specific configuration is contained in a new shipper configuration object for each output. That object optionally contains an enabled key, which defaults to true if it is not present. For example:

# Enables the shipper with default settings
shipper: {}

# Disables the shipper
shipper:
  enabled: false

# Also enables the shipper
shipper:
  enabled: true

The full logic the agent uses to determine whether to start a shipper is in this function.

The options in the first section (Maximum batch size, flush timeout and memory queue size) already exist. I connected them to these three parameters, but I'd like to make sure that they're correct ones:

@cmacknz & @leehinman could you please direct @criamico here as to what parameters should be configured?

The exact shipper configuration is still a bit in flux. elastic/elastic-agent-shipper#161 is the issue to align it with what was implemented in the agent in elastic/elastic-agent#1527. This is where we will finalize the configuration.

In the meantime, the preliminary definition of the shipper queue parameters can be found in the reference configuration file: https://github.com/elastic/elastic-agent-shipper/blob/01b1a062f770377818e6b32c6508f4df40626854/elastic-agent-shipper.yml#L57

What will happen in elastic/elastic-agent-shipper#161 is that the queue configuration there will become a sub-object of the shipper: section in the agent policy. So for the memory and disk queue sizes you could configure:

# Set the shipper memory queue size.
shipper:
  queue:
    mem:
      events: 4096

# Set the shipper disk queue size.
shipper:
  queue:
    disk:
      max_size: 4096

The maximum batch size and flush timeout parameters are not defined yet for the shipper, as they differ from what we do in Beats and require a new implementation. We can create placeholders for them and come back and change them later if needed so this can move forward.

These parameters will be shipper specific, and similar to the queue can be specified as properties of the shipper's Elasticsearch output like:

shipper:
  output:
    elasticsearch:
      # Note: I prefer using maximum_batch_bytes instead of maximum_batch_size so the units are obvious.
      maximum_batch_bytes: 100
      output_queue_flush_timeout: 1s

Note that the output configuration above is subject to change. I will link back to this comment in the shipper implementation issue so we can follow up if we need to change them.

Let me know if there are any questions, happy to sync up as needed to clarify anything.

@criamico
Copy link
Contributor

criamico commented Nov 10, 2022

Thanks @cmacknz, your explanation is very useful.

In the original description of this ticket, I read

toggle for disk queue vs memory queue (memory queue is default)

Does it mean that when the toggle is disabled the memory queue defined here is going to be used?

@criamico
Copy link
Contributor

if downgrade is to 8.5 or older memory queue is used (disk queue wasn't available)

The agent policies are not specific for an agent version, does the agent < 8.6 will handle the option and use memory queue if we pass disk queue or it will crash?

@cmacknz any thoughts on this?

@cmacknz
Copy link
Member

cmacknz commented Nov 10, 2022

Does it mean that when the toggle is disabled the memory queue defined here is going to be used?

Yes the default is the memory queue, which is the case for the agent today. For the shipper the memory queue parameters are defined here.

The agent policies are not specific for an agent version, does the agent < 8.6 will handle the option and use memory queue if we pass disk queue or it will crash?

What should happen is that the new configuration under the shipper: configuration object is simply ignored before 8.6. We will have to test this, but I don't expect issues with backwards compatibility.

@criamico
Copy link
Contributor

Linking this proposed change for visibility: elastic/elastic-agent#1729

@criamico
Copy link
Contributor

criamico commented Nov 16, 2022

@nimarezainia I have some questions about the UI where I would need your input.

  • Encryption in the mockups is a toggle, which suggest a boolean type, but in the configuration file is actually a string. Should the UI expose only the enabling/disabling or also the password input itself?
# The password used to encrypt the disk queue with. Default is "" which
    # means no encryption.
    # encryption_password: ""
  • Enabling the new shipper via yml file is doable but cumbersome. What do you think of adding a toggle on top of the form that exposes the shipper feature flag? if disabled, the whole form would be hidden. This would be inside the advanced options collapsible button.

  • Should all the options from the shipper config be exposed? At the moment only path, max_size, encryption_password and use_compression are shown in the mockup for disk queue. There are other five parameters that are not exposed and I'd like to have some guidance. The parameters are:

  segment_size
  read_ahead
  write_ahead
  retry_interval

@nimarezainia
Copy link
Contributor

@nimarezainia I have some questions about the UI where I would need your input.

* `Encryption` in the mockups is a toggle, which suggest a `boolean type`, but in the [configuration file](https://github.com/elastic/elastic-agent-shipper/blob/01b1a062f770377818e6b32c6508f4df40626854/elastic-agent-shipper.yml#L113) is actually a string. Should the UI expose only the enabling/disabling or also the password input itself?
# The password used to encrypt the disk queue with. Default is "" which
    # means no encryption.
    # encryption_password: ""

@criamico Could we have the toggle which when enabled we pop a password dialogue box
(@cmacknz @leehinman thoughts?)

* Enabling the new shipper via yml file is doable but cumbersome. What do you think of adding a toggle on top of the form that exposes the `shipper` feature flag? if disabled, the whole form would be hidden. This would be inside the `advanced options` collapsible button.

@criamico I'm ok with that as long as it's somewhat hidden. Mainly because we want to avoid introducing a breaking change. @cmacknz you ok with this?

* Should all the options from the shipper config be exposed? At the moment only `path`, `max_size`, `encryption_password` and `use_compression` are shown in the mockup for disk queue. There are other five parameters that are not exposed and I'd like to have some guidance. The parameters are:
  segment_size
  read_ahead
  write_ahead
  retry_interval

Skip these. I am hoping a sensible default would suffice.

Thanks @criamico

@criamico
Copy link
Contributor

I'm ok with that as long as it's somewhat hidden. Mainly because we want to avoid introducing a breaking change.

@nimarezainia Having the toggle inside the "Advanced options" would be enough?

@criamico
Copy link
Contributor

criamico commented Nov 22, 2022

Clarifications after meeting with @cmacknz and @nimarezainia:

  • The shipper options to be visible for all the available outputs (both ES and Logstash)
  • Load balancing and compression options to be moved together in the same section, as they are output dependent. They'll be always visible in the case of Logstash/ES, however when the Kafka output will be added won't be applicable and will need to be hidden/grayed out
  • The encryption password doesn't need to be exposed in the UI as users don't need to be aware of it
  • The first three parameters in the mock up are still in definition. Using some temporary names for now until they get decided (see comment)
  • Memory queue size to be Memory queue events and the copy needs to be updated accordingly @nimarezainia

EDIT: Updated the ticket description based on these points

@nimarezainia
Copy link
Contributor

@criamico thanks for the summary. I updated the definition document as discussed to rearrange the mockup.

regarding the copy for Memory Queue Events:
"Maximum number of events that can be stored in the queue. Default is set to 4096"

--> wouldn't want to elaborate too much on that copy and memory queue events setting as it's just a temporary config element and will be removed in favor of Memory Queue Size.

@kpollich
Copy link
Member

Hey @criamico - I know the UI here isn't quite wired up yet, but could we do a recorded demo of the new options, structure, etc as you've outlined above in #141508 (comment)? It'd be good to show the rest of the agent team how we're thinking about these options in the UI via a mailing list at some point.

Let me know if you think that's workable - happy to help however I can. Thanks much!

@criamico
Copy link
Contributor

@kpollich Yes I think I can do a recorded demo with using the current branch, since the UI works and it shouldn't change much anymore.
I'm waiting for the shipper team to complete their work, so I can get the finalised parameter names to write to the agent policy and the PR will be ready to be merged.

@jlind23
Copy link
Contributor

jlind23 commented Dec 8, 2022

@criamico @nimarezainia Where are we at here? Do we now have enough information to unblock this?

@criamico
Copy link
Contributor

Where are we at here? Do we now have enough information to unblock this?

@jlind23 I'm waiting to have the finalised parameters from the shipper to complete my PR. I could merge it but it would surely need another PR to align to the shipper configuration.

@cmacknz is there any news on the final version of the shipper?

@cmacknz
Copy link
Member

cmacknz commented Dec 12, 2022

@cmacknz is there any news on the final version of the shipper?

We have been prioritizing bug fixes for the 8.6 release over making progress on the shipper. I'll let you know once we have finalized the parameters.

@criamico
Copy link
Contributor

This issue will be closed once #145755 is merged, however I created a follow up ticket to track any follow up work to this: #147613

criamico added a commit that referenced this issue Jan 10, 2023
…145755)

## Summary
Closes #141508

### Description
Implements a form for the new Elastic Agent Shipper. **It only gets
enabled only when the user explicitly enables it from the yaml editor.**
- To enable the form, add to the yaml editor under outputs one of the
following, save and then the new section should appear under "advanced
options"
```
# Enables the shipper with default settings
shipper: {}

# Also enables the shipper
shipper:
  enabled: true
```

I added the following new parameters to `Output` and to `ingest-outputs`
SO as well:
```
shipper?: {
    disk_queue_enabled?: boolean;
    disk_queue_path?: string;
    disk_queue_max_size?: number;
    disk_queue_encryption_enabled?: boolean;
    disk_queue_compression_enabled?: boolean;
    compression_level?: number;
    loadbalance?: boolean;
    mem_queue_events?: number;
    queue_flush_timeout?: number;
    max_batch_bytes?: number;
}
```

Note that another PR will likely follow to align with the structure
defined in the new shipper (see [this
comment](#141508 (comment))
for further explanation)

<details><summary>Screenshots</summary>
<img width="745" alt="Screenshot 2022-11-23 at 12 35 38"
src="https://user-images.githubusercontent.com/16084106/203537091-47ed64f8-bb13-4960-b1aa-5bc73fd2e37b.png">

<img width="720" alt="Screenshot 2022-11-23 at 12 35 06"
src="https://user-images.githubusercontent.com/16084106/203537130-e5cfcf89-a88d-43a2-8348-2f79c324ff80.png">

Generated policy under "view policy":
<img width="737" alt="Screenshot 2022-11-23 at 12 34 09"
src="https://user-images.githubusercontent.com/16084106/203536868-2d11d236-5056-473d-97a0-edae327665fc.png">



</details>

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co>
Co-authored-by: Nicolas Chaulet <nicolas.chaulet@elastic.co>
Co-authored-by: nima <nima.rezainia@elastic.co>
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
jennypavlova pushed a commit to jennypavlova/kibana that referenced this issue Jan 13, 2023
…lastic#145755)

## Summary
Closes elastic#141508

### Description
Implements a form for the new Elastic Agent Shipper. **It only gets
enabled only when the user explicitly enables it from the yaml editor.**
- To enable the form, add to the yaml editor under outputs one of the
following, save and then the new section should appear under "advanced
options"
```
# Enables the shipper with default settings
shipper: {}

# Also enables the shipper
shipper:
  enabled: true
```

I added the following new parameters to `Output` and to `ingest-outputs`
SO as well:
```
shipper?: {
    disk_queue_enabled?: boolean;
    disk_queue_path?: string;
    disk_queue_max_size?: number;
    disk_queue_encryption_enabled?: boolean;
    disk_queue_compression_enabled?: boolean;
    compression_level?: number;
    loadbalance?: boolean;
    mem_queue_events?: number;
    queue_flush_timeout?: number;
    max_batch_bytes?: number;
}
```

Note that another PR will likely follow to align with the structure
defined in the new shipper (see [this
comment](elastic#141508 (comment))
for further explanation)

<details><summary>Screenshots</summary>
<img width="745" alt="Screenshot 2022-11-23 at 12 35 38"
src="https://user-images.githubusercontent.com/16084106/203537091-47ed64f8-bb13-4960-b1aa-5bc73fd2e37b.png">

<img width="720" alt="Screenshot 2022-11-23 at 12 35 06"
src="https://user-images.githubusercontent.com/16084106/203537130-e5cfcf89-a88d-43a2-8348-2f79c324ff80.png">

Generated policy under "view policy":
<img width="737" alt="Screenshot 2022-11-23 at 12 34 09"
src="https://user-images.githubusercontent.com/16084106/203536868-2d11d236-5056-473d-97a0-edae327665fc.png">



</details>

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Christiane (Tina) Heiligers <christiane.heiligers@elastic.co>
Co-authored-by: Alejandro Fernández Haro <alejandro.haro@elastic.co>
Co-authored-by: Nicolas Chaulet <nicolas.chaulet@elastic.co>
Co-authored-by: nima <nima.rezainia@elastic.co>
Co-authored-by: Josh Dover <1813008+joshdover@users.noreply.github.com>
@kpollich kpollich removed the QA:Needs Validation Issue needs to be validated by QA label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team v8.6.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants