Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Handle data streams #24223

Merged
merged 13 commits into from
Mar 4, 2021
Merged

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Feb 24, 2021

Handles data streams from fleet, the current heartbeat code doesn't handle data_streams at all, this fixes that. Additionally, it hoists the id from the top level of the yaml config, and merges both levels of data_streams, since one is at the input level, and the other is at the stream level.

Additionally, this cleans up the data streams code significantly, introducing a new add_data_streams_index processor that:

  1. More efficiently formats index names for data streams
  2. Allows individual events to override the dataset (useful for synthetics where we have a base browser dataset, but also browser_screenshot and browser_network for extended data that can take up lots of space, and often requires a different ILM policy).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Use the following heartbeat.yml and verify the output sets the correct monitor ID and writes to the correct data streams

# Configure monitors inline
heartbeat.monitors:
- id: 4ae879a9-b5da-4132-94ba-ab6e1fcbdc6e
  name: Sample monitor 2
  revision: 1
  type: synthetics/http
  use_output: default
  meta:
    package:
    name: synthetics
    version: 0.1.28
  data_stream:
    namespace: default
  streams:
   - id: synthetics/http-http-4ae879a9-b5da-4132-94ba-ab6e1fcbdc6e
     name: Sample monitor 2
     type: http
     data_stream:
       dataset: http
       type: synthetics
     urls: 'http://elastic.co'
     service.name: APM Service Name
     schedule: '@every 5s'
     timeout: 1600
     max_redirects: 1
     proxy_url: 'http://elastic.co'
     tags:
       - tag
       - tag2 

@andrewvc andrewvc added enhancement Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Feb 24, 2021
@andrewvc andrewvc self-assigned this Feb 24, 2021
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 24, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 24, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #24223 updated

  • Start Time: 2021-03-04T16:18:01.089+0000

  • Duration: 52 min 39 sec

  • Commit: 15ead22

Test stats 🧪

Test Results
Failed 0
Passed 45789
Skipped 4916
Total 50705

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 45789
Skipped 4916
Total 50705

@andrewvc andrewvc marked this pull request as ready for review March 2, 2021 23:20
@andrewvc andrewvc requested a review from a team as a code owner March 2, 2021 23:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@andrewvc andrewvc requested a review from blakerouse March 2, 2021 23:20
@andrewvc andrewvc changed the title Handle data streams [Heartbeat] Handle data streams Mar 2, 2021
@ruflin ruflin requested a review from urso March 4, 2021 11:35
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Didn't see any issues and looks well tested.

Only thing is the imports seem a little bit jumbled. It would just make the code cleaner to fix the groupings.

// UnnestStream detects configs that come from fleet and transforms the config into something compatible
// with heartbeat, by mixing some fields (id, data_stream) with those from the first stream. It assumes
// that there is exactly one stream associated with the input.
func UnnestStream(config *common.Config) (res *common.Config, err error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can rather have agent (or the spec file) to produce a complete/correct configuration (for all Beats). The merging code here makes it look like the responsibility of understanding streams from Fleet configurations is shared between agent+heartbeat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I know the answer to that since I'm unfamiliar with how the parsing for the other beats works. For heartbeat we decided to keep the agent logic minimal and put more in heartbeat. That decision has worked out well since debugging issues in heartbeat is much simpler without fleet involved. I know that effectively heartbeat.monitors is equivalent to inputs in agent, so I can copy / paste fleet configs straight in for manual testing to confirm.

@urso
Copy link

urso commented Mar 4, 2021

Would be nice if you can add the change to Filebeat/Packetbeat/Metricbeat right away in order to reduce fragmentation.

@andrewvc andrewvc merged commit 74a3a44 into elastic:master Mar 4, 2021
@andrewvc andrewvc deleted the handle-streams-input branch March 4, 2021 20:06
andrewvc added a commit to andrewvc/beats that referenced this pull request Mar 4, 2021
Handles data streams from fleet, the current heartbeat code doesn't handle data_streams at all, this fixes that. Additionally, it hoists the id from the top level of the yaml config, and merges both levels of data_streams, since one is at the input level, and the other is at the stream level.

Additionally, this cleans up the data streams code significantly, introducing a new add_data_streams_index processor that:

    More efficiently formats index names for data streams
    Allows individual events to override the dataset (useful for synthetics where we have a base browser dataset, but also browser_screenshot and browser_network for extended data that can take up lots of space, and often requires a different ILM policy).
andrewvc added a commit that referenced this pull request Mar 4, 2021
Handles data streams from fleet, the current heartbeat code doesn't handle data_streams at all, this fixes that. Additionally, it hoists the id from the top level of the yaml config, and merges both levels of data_streams, since one is at the input level, and the other is at the stream level.

Additionally, this cleans up the data streams code significantly, introducing a new add_data_streams_index processor that:

    More efficiently formats index names for data streams
    Allows individual events to override the dataset (useful for synthetics where we have a base browser dataset, but also browser_screenshot and browser_network for extended data that can take up lots of space, and often requires a different ILM policy).
andrewvc added a commit to andrewvc/beats that referenced this pull request Mar 4, 2021
Handles data streams from fleet, the current heartbeat code doesn't handle data_streams at all, this fixes that. Additionally, it hoists the id from the top level of the yaml config, and merges both levels of data_streams, since one is at the input level, and the other is at the stream level.

Additionally, this cleans up the data streams code significantly, introducing a new add_data_streams_index processor that:

    More efficiently formats index names for data streams
    Allows individual events to override the dataset (useful for synthetics where we have a base browser dataset, but also browser_screenshot and browser_network for extended data that can take up lots of space, and often requires a different ILM policy).
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Handles data streams from fleet, the current heartbeat code doesn't handle data_streams at all, this fixes that. Additionally, it hoists the id from the top level of the yaml config, and merges both levels of data_streams, since one is at the input level, and the other is at the stream level.

Additionally, this cleans up the data streams code significantly, introducing a new add_data_streams_index processor that:

    More efficiently formats index names for data streams
    Allows individual events to override the dataset (useful for synthetics where we have a base browser dataset, but also browser_screenshot and browser_network for extended data that can take up lots of space, and often requires a different ILM policy).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants