Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit error if fileset with multiple pipelines is being used with ES < 6.5 #10001

Merged

Conversation

ycombinator
Copy link
Contributor

Follow up to #8914.

In #8914, we introduced the ability for Filebeat filesets to have multiple Ingest pipelines, the first one being the entry point. This feature relies on the Elasticsearch Ingest node having a pipeline processor and if conditions for processors, both of which were introduced in Elasticsearch 6.5.0.

This PR implements a check for whether a fileset has multiple Ingest pipelines AND is talking to an Elasticsearch cluster < 6.5.0. If that's the case, we emit an error.

@ycombinator ycombinator added review Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. v7.0.0 v6.7.0 labels Jan 10, 2019
@ycombinator ycombinator requested a review from a team as a code owner January 10, 2019 17:20
@ycombinator
Copy link
Contributor Author

ycombinator commented Jan 10, 2019

A couple of questions about this PR for the reviewers:

  1. When the check implemented by this PR fails, we emit an error message in the Filebeat logs like so:

    2019-01-10T09:23:42.716-0800    ERROR   fileset/factory.go:142  Error loading pipeline: the elasticsearch/slowlog fileset has multiple pipelines, which are only supported with Elasticsearch >= 6.5.0. Currently running with Elasticsearch version 7.0.0-SNAPSHOT
    

    However, Filebeat continues to run. Just want to check with you that this is okay?

  2. Testing this PR requires running against a version of ES < 6.5.0. Is there some way we can automate this in our tests? Is there somewhere we do this sort of testing already?

@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator
Copy link
Contributor Author

Ignore my question 2. I think I can just write a unit test for this. 🤦‍♂️

// Filesets with multiple pipelines can only be supported by Elasticsearch >= 6.5.0
esVersion := esClient.GetVersion()
minESVersionRequired := common.MustNewVersion("6.5.0")
if len(pipelines) > 1 && esVersion.LessThan(minESVersionRequired) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the current LS module also fall into this statement? Probably not because it only loads 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Logstash module actually loads both/all pipelines but it only uses one of them at runtime. So it doesn't fall into this statement. I just tested this to confirm as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't we want to fail installing the LS module?

Copy link
Contributor Author

@ycombinator ycombinator Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. The logstash/log fileset's manifest.yml looks like this:

ingest_pipeline: ingest/pipeline-{{.format}}.json

Whereas a fileset that is using the multiple pipelines feature will look like this:

ingest_pipeline:
  - ingest/entry_point_pipeline.json
  - ingest/some_sub_pipeline.json
  - ingest/another_sub_pipeline.json

The latter one uses the new multiple-pipeline feature, which is what the version check is for.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a changelog entry and would be great to have some basic tests around it.

@ycombinator ycombinator force-pushed the fb-multi-pipelines-version-check branch 2 times, most recently from aa0fe4b to 58c0a87 Compare January 11, 2019 11:48
@ycombinator
Copy link
Contributor Author

Added a CHANGELOG entry and unit test. Ready for review again. Thanks!

filebeat/fileset/pipelines_test.go Outdated Show resolved Hide resolved
@ycombinator
Copy link
Contributor Author

jenkins, test this

},
}

for _, test := range tests {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop creates a many concurrent HTTP servers not being cleaned up between the different tests. This is due to defer operating at function-scope, not block scope.

Use t.Run(name, func(t *testing.T) { ... }) so to create separate isolated tests (defer in the body will properly cleanup resources). This improves output/readability of test results, plus allows developers to run single tests.

Common pattern:

cases := map[string]struct{
   ...
}{
  ...
}
for name, test := range cases {
  test := test // copy test into local block scope (in case you want to enable t.Parallel())
  t.Run(name, func(t *testing.T) {
    // common test body
  })
}

Copy link
Contributor Author

@ycombinator ycombinator Jan 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to defer operating at function-scope, not block scope.

Thanks, TIL!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso Would you recommend using t.Parallel() in general, or would you say it depends on how much work each test case is doing and how many test cases there are to run in all?

Copy link

@urso urso Jan 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running with t.Parallel() is a great way to see if things are really isolated :)
Tests are run package by package. Using t.Parallel() is a nice way to speed up tests in a package that already a few seconds. Checking libbeat unit tests, most are finished in a few ms. Not really worth it to optimize already fast tests.

When using t.Parallel() the parent test blocks until all it's child tests have returned. The number of active concurrent tests to be run depends on the -parallel CLI flag. by default it's the number of cores (GOMAXPROCS env variable) you have in your system. We don't use t.Parallel() much, but for packages with a large number of unit tests, that also take a little longer to execute it can really make a difference. The t.Run() method always spawns a go-routine. That is, there is no real additional cost by adding t.Parallel() even for very small tests.

Just tried to enable t.Parallel() on beats queues test on my machine. Duration went from 20s to ~5s. I'm always happy if I don't have to wait for long unit tests to finish.

We should also make more use of testing/quick (if applicable). These will definitely profit from t.Parallel().

Still most time is spend in system tests.

@ycombinator ycombinator force-pushed the fb-multi-pipelines-version-check branch from a2820bd to 27c3e97 Compare January 13, 2019 00:00
@ycombinator ycombinator merged commit c55226e into elastic:master Jan 13, 2019
@ycombinator ycombinator deleted the fb-multi-pipelines-version-check branch January 13, 2019 04:58
ycombinator added a commit to ycombinator/beats that referenced this pull request Jan 13, 2019
… 6.5 (elastic#10001)

Follow up to elastic#8914.

In elastic#8914, we introduced the ability for Filebeat filesets to have multiple Ingest pipelines, the first one being the entry point. This feature relies on the Elasticsearch Ingest node having a `pipeline` processor and `if` conditions for processors, both of which were introduced in Elasticsearch 6.5.0.

This PR implements a check for whether a fileset has multiple Ingest pipelines AND is talking to an Elasticsearch cluster < 6.5.0. If that's the case, we emit an error.

(cherry picked from commit c55226e)
@ycombinator ycombinator removed the needs_backport PR is waiting to be backported to other branches. label Jan 13, 2019
ycombinator added a commit that referenced this pull request Jan 14, 2019
…nes is being used with ES < 6.5 (#10038)

Cherry-pick of PR #10001 to 6.x branch. Original message: 

Follow up to #8914.

In #8914, we introduced the ability for Filebeat filesets to have multiple Ingest pipelines, the first one being the entry point. This feature relies on the Elasticsearch Ingest node having a `pipeline` processor and `if` conditions for processors, both of which were introduced in Elasticsearch 6.5.0.

This PR implements a check for whether a fileset has multiple Ingest pipelines AND is talking to an Elasticsearch cluster < 6.5.0. If that's the case, we emit an error.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants