Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] De-duplicate pipeline ids based on the ephemeral_id changing #49978

Merged

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Nov 1, 2019

Resolves #49462

This PR fixes an issue where users saw duplicate pipeline ids in their pipeline listing page because they had restarted their pipelines (which causes the ephemeral_id to change).

In our logic to get the entire list of pipeline ids, we are doing a composite agg across id, hash and ephemeral_id because we need those lists in order to reduce the number of buckets created in the subsequent queries (see here for more information). If the ephemeral_id changes over time for a given pipeline, two results will show for the single id resulting in the behavior described in #49462

To reproduce, simply setup logstash with multiple pipelines and enable monitoring. Navigate to the pipelines listing page to verify they show up there, then restart logstash. Before this fix, duplicates should show up.

Notes

  • I'm fairly sure it isn't likely we show incorrect data - by de-duplicating, we're only fetching data for each id once, even if there are multiple ephemeral_id ids for that id. Maybe @cachedout can help educate me a bit here

TODO

  • Add tests

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@amitavmohanty01
Copy link

@chrisronline Can you please merge the fix to 7.4 branch as well? As of now, 7.4 is the current branch as per documentation so I would request you to release this fix for 7.4 as well.

Copy link
Contributor

@igoristic igoristic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Works as expected 👍

Thanks for the instructions on reproducing the behavior

Approved so long as @cachedout addresses your concerns in the description and merge upstream passes

@cachedout
Copy link
Contributor

This could work but it has an important caveat that I'll describe below. I think that what we may want here is actually to ensure that the hash key is the one that's unique versus using id for this purpose. I'll demonstrate below why I think this may be the case.

Scenario 1

A simple pipeline definition

config.pipelines.yml

 - pipeline.id: test
   pipeline.workers: 1
   pipeline.batch.size: 1
   config.string: "input { generator {} } filter { sleep { time => 1 } } output { stdout { codec => dots } }"

Below is the JSON output for two keys:

[Filter]> .pipelines.test.ephemeral_id
"badcbe9e-c4e2-46d1-8689-8f07af3c4f85"
[Filter]> .pipelines.test.hash
"a3bf686e789a4fbb312192cf2f27667831f9c56b7c9394c1f2d8b933fe5c3ddc"

Scenario 2

Using the same pipeline definition as Scenario 1 and simply restarting Logstash after collecting data from Scenario 1

[Filter]> .pipelines.test.ephemeral_id
"bc92413c-556d-4ce3-b0ff-eba23406821a"
[Filter]> .pipelines.test.hash
"a3bf686e789a4fbb312192cf2f27667831f9c56b7c9394c1f2d8b933fe5c3ddc"

As we can see, the ephemeral identifier has changed but the hash (and the ID which is not shown above) both remain the same.

Scenario 3

Changing the pipeline definition and restarting Logstash

Here we change the pipeline definition by altering the sleep timer to 2 seconds.

 - pipeline.id: test
   pipeline.workers: 1
   pipeline.batch.size: 1
   config.string: "input { generator {} } filter { sleep { time => 2 } } output { stdout { codec => dots } }"
[Filter]> .pipelines.test.ephemeral_id
"cd880b73-6dcd-4678-9459-61f70bc843be"
[Filter]> .pipelines.test.hash
"5fd63e8270d1cdae04a7ebdb95626dc93f40e4d1355df2044a70825a4ef70e10"

Summary

In every scenario, the pipeline identifier is the same and is returned as test. When Logstash restarts, we see the ephemeral identifier change as expected while the hash remains the same -- indicating that the pipeline definition has not changed; e.g. it is the same pipeline.

However, when the pipeline definition is modified it becomes an ostensibly different pipeline, despite having the same ID as a previously defined pipeline. As such, we may wish to consider it as a different entity for the purposes of monitoring.

The counterpoint, of course, is that this may be surprising to the user especially if they make many changes to the same pipeline definition.

So, it may be that using the ID is the right thing to do from the perspective of the user who might be more inclined to believe that they're working with the same pipeline over time even though Logstash sees these internally as different, but I think this is worthy of a further discussion. I'll leave it there and wait for comments before we come to a decision. :)

Copy link
Contributor

@cachedout cachedout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left my review notes in comment on the PR itself.

@chrisronline
Copy link
Contributor Author

@chrisronline Can you please merge the fix to 7.4 branch as well? As of now, 7.4 is the current branch as per documentation so I would request you to release this fix for 7.4 as well.

Sure!

@amitavmohanty01
Copy link

@cachedout As a user, the following matters to me.

When a pipeline is defined as follows:

- pipeline.id: syslog
  path.config: "/etc/logstash/syslog.conf"

The changes to the pipeline are done in the conf file. From a monitoring perspective, the pipeline should be the same after every restart. This functionality allows me to compare the effect of the changes on the performance of the pipeline.

Taking a step back, I would like to pose the following situation.

** Box: 1 **

- pipeline.id: syslog
  path.config: "/etc/logstash/syslog.conf"

** Box: 2 **

- pipeline.id: syslog
  path.config: "/etc/logstash/syslog_other.conf"

The error above can be deliberate or it can also be a typo by the user.

From a design perspective, one option is to say, all pipelines with the same id will be considered together in the aggregation on the monitoring page. It is an easier to implement option and puts the onus of maintaining correct config on the user. Another option is to aggregate on id as long as the config matches. It would require deep comparision of each pipeline. Yet another option would be not let the option of having such a config as shown above. In that case, monitoring page should throw an error when loaded. This however rules out the cases when the user is creating such a config deliberately.

@chrisronline
Copy link
Contributor Author

Thanks all for the thoughts!

I'm fairly sure it isn't likely we show incorrect data - by de-duplicating, we're only fetching data for each id once, even if there are multiple ephemeral_id ids for that id. Maybe @cachedout can help educate me a bit here

I ran some tests locally and I don't think this has much merit. For the listing page (versus the actual pipeline page), we are showing aggregated data from the last two buckets in the sequence (two so we can perform any necessary derivatives). It's unlikely that the hash/ephemeral_id change so frequently that in a 20s span, there is much of a difference in calculated metrics.

Assuming everyone is okay with it, I think we ignore the concern and move on with the PR.

@chrisronline
Copy link
Contributor Author

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@chrisronline chrisronline requested a review from a team as a code owner November 14, 2019 01:58
@chrisronline chrisronline requested a review from a team November 14, 2019 01:58
@chrisronline chrisronline requested review from a team as code owners November 14, 2019 01:58
@chrisronline chrisronline removed request for a team November 14, 2019 02:00
@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@chrisronline chrisronline merged commit 5a8fabd into elastic:master Nov 14, 2019
@chrisronline chrisronline deleted the monitoring/logstash_pipelines_uniq branch November 14, 2019 16:10
chrisronline added a commit to chrisronline/kibana that referenced this pull request Nov 14, 2019
…ging (elastic#49978)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
chrisronline added a commit to chrisronline/kibana that referenced this pull request Nov 14, 2019
…ging (elastic#49978)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
chrisronline added a commit to chrisronline/kibana that referenced this pull request Nov 14, 2019
…ging (elastic#49978)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
chrisronline added a commit that referenced this pull request Nov 14, 2019
…ging (#49978) (#50673)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
chrisronline added a commit that referenced this pull request Nov 14, 2019
…ging (#49978) (#50671)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
chrisronline added a commit that referenced this pull request Nov 15, 2019
…ging (#49978) (#50672)

* De-duplicate pipeline ids based on the ephemeral_id changing

* Add tests
@chrisronline
Copy link
Contributor Author

Backport:

7.4: e1cccc8
7.5: f2e71f4
7.x: fca66a0

jloleysens added a commit to jloleysens/kibana that referenced this pull request Nov 15, 2019
…ger-ace-theme

* 'master' of github.com:elastic/kibana: (54 commits)
  [ML] Fixes word wrap in Overview page sidebar on IE (elastic#50668)
  Upgrade to TypeScript 3.7.2 (elastic#47188)
  fix: hide 'edit' button for mobile for dashboards (elastic#50639)
  fixes conditional links tests (elastic#50642)
  [SIEM] Fix IE11 timeline drag and drop issue (elastic#50528)
  [SIEM] Add SavedQuery in Timeline (elastic#49813)
  chore(NA): remove code plugin from codeowners (elastic#50451)
  [DOCS] Adds documentation on telemetry settings (elastic#50739)
  [Logs UI] Add IE11-specific CSS fixes for anomalies table (elastic#49980)
  [DOCS][SIEM]: Change Kibana advanced settings to match UI (elastic#50679)
  Change URLs for support menu (elastic#50700)
  [Reporting] Remove any types and references to Hapi (elastic#49250)
  [DOCS] Adds note about backups to Upgrade doc (elastic#50525)
  [Logs UI] Improve infra plugin compatibility with TS 3.7 (elastic#50491)
  [Task manager] Adds ensureScheduling api to allow safer rescheduling of existing tasks (elastic#50232)
  [DOCS] Adds link to content security policy doc (elastic#50698)
  Remove duplicate but in error message (elastic#50530)
  [ML] DF Analytics: Ensure creation flyout can be opened when no jobs exist (elastic#50417)
  Add filebeat notice (elastic#49065)
  [Monitoring] De-duplicate pipeline ids based on the ephemeral_id changing (elastic#49978)
  ...

# Conflicts:
#	x-pack/legacy/plugins/grokdebugger/public/components/grok_debugger/brace_imports.ts
@timroes timroes added v7.5.0 and removed v7.5.1 labels Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Monitoring] Duplicated pipelines on the Pipelines overview page
6 participants