Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat: expand double wildcards in prospector #3980

Merged
merged 3 commits into from
Apr 26, 2017
Merged

filebeat: expand double wildcards in prospector #3980

merged 3 commits into from
Apr 26, 2017

Conversation

7AC
Copy link
Contributor

@7AC 7AC commented Apr 10, 2017

Expand double wildcards into standard glob patterns, up to a maximum
depth of 16 levels after the wildcard.

Resolves #2084

@7AC 7AC added the review label Apr 10, 2017
// glob detects the use of "**" and expands it to standard glob patterns up to a max depth
func glob(g string) ([]string, error) {
var globs []string
if filepath.Base(g) == "**" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it only detects ** at the end of the path? Does /foo/**/bar work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't realize that was also a use case. Let me fix it, thanks

@tsg
Copy link
Contributor

tsg commented Apr 10, 2017

I'm 👍 with this approach, since I think it's the simplest implementation that gets us what most people want. I'm on the fence about making the 16 configurable or not, but I guess it would be good to be able to set it to 0 in which case the feature is disabled (for BWC).

@ruflin ruflin added Filebeat Filebeat in progress Pull request is currently in progress. labels Apr 11, 2017
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach that by default we have the old behaviour.

This PR should also have changelog entry and docs.

@@ -112,17 +114,69 @@ func (l *Log) Run() {
}
}

// globPattern detects the use of "**" and expands it to standard glob patterns up to a max depth
func (l *Log) globPatterns(pattern string) ([]string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could move this to a separate package and pass the max depth to the function. Reason is like this we could also use it in other places and it does not further "bloat" the prospector code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like that

Module string `config:"_module_name"` // hidden option to set the module name
Fileset string `config:"_fileset_name"` // hidden option to set the fileset name
Processors processors.PluginConfig `config:"processors"`
DoubleWildcardMaxDepth uint8 `config:"double_wildcard_max_depth"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Where is the default set for this one?
  • The config name seems to be a bit long

I would like to have this feature disabled by default and people who need it enable it specifically. This also makes it easier for us to debug it in the future in case of errors, because we know someone enabled it.

I like the idea of defining the max depth as in case someone just needs one or two levels, this would have positive performance implications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just assumed the golang type default (0) is used as fallback? Totally agree on the config name: how about double_star_pattern_depth? Maybe not shorter but simpler..

patternList := strings.Split(pattern, "/")
for i, dir := range patternList {
if len(patterns) > 0 {
if dir == "**" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means test/**/hello/**/abc.log is not going to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we were discussing with @tsg and if we allow that the complexity will escalate quickly

pattern string
expectedMatches []string
}{
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests which have a deeper depth like 4 or 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}
var patterns []string
isAbs := filepath.IsAbs(pattern)
patternList := strings.Split(pattern, "/")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about windows users having \ in their path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I should use os.PathSeparator instead

}
}
if len(patterns) == 0 {
patterns = []string{pattern}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have any kind of error if pattern is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check on that, I guess we should

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so right now we're not doing any validation on the pattern, with this change that wouldn't change wrt an empty pattern. Do we want to start validating those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no, since for example, I use "" in the filebeat modules to disable a fileset on a particular version.

{
"foo/**/bazz",
[]string{},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more tests to add:

  • double / like foo//bar
  • foo\bar
  • foo\**\bar
  • c:\foo\**\bar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't add the last test since I end up joining all the paths to a root that has a volume in it on windows, via ioutil.TempDir()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new test just for cases like this

if len(patterns) > 0 {
if dir == "**" {
err := fmt.Sprintf("glob(%s) failed: cannot specify multiple ** within a pattern", pattern)
logp.Err(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recommend logging errors and returning them. Do one or the other. It's probably best to let the caller make the decision on how to handle the error (which might be to log it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

===== double_star_pattern_depth

The maximum depth for `**` patterns. This setting is disabled by default, i.e.,
`**` is not expanded. When it's set to a positive value, a single `**` per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the limit in single ** per path? With a bit of recursion in the expansion logic I think this could be handled elegantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worried me and @tsg was how the complexity could easily escalate here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also worried about the number of generated globs. With a max depth of 16, and using ** twice, you need to generate 16*16 patterns, right? If you use ** three times that's already 4096 globs to evaluate, which might be problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit mitigated by the fact that we default to zero (disabled) now, but I'd still wait for an actual request/use case for it before opening up to multiple **.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@7AC See my comment above. I'm ok with having it fixed at first but that we take into account now, it could appear in the future.

}
var patterns []string
isAbs := filepath.IsAbs(pattern)
patternList := strings.Split(pattern, string(os.PathSeparator))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.PathSeparator might not be safe. windows users sometimes use / for paths (if they didn't get \ working due to not using single-quote), which is working with go as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, didn't know that. So the windows case is basically a superset, let me handle that too.

@@ -82,6 +82,11 @@ filebeat.prospectors:
# This is especially useful for multiline log messages which can get large.
#max_bytes: 10485760

# Maximum depth for "**" patterns. This setting is disabled by default, i.e.,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow expect we will add more config options in the future. So I would suggest to make this a namespace with different config options below. Suggestion:

recursive_glob.enabled: false # set this to true to get the defaults for the below options
recursive_glob.depth: 16 # max depth of expanding **
recursive_glob.max_count: 1 # max number of ** in a single path

This could also solve the discussion about how many patterns we allow per path.

I named it recursive_glob because that what its doing from my perspective. Not sure what the user would look for when searching for such a config option. I had also recursive_path because we call it paths in the config above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to recursive_glob_depth for now, I don't think we should encourage users to use more than one per path so the namespace seems overkill to me.

DocumentType: "log",
ScanFrequency: 10 * time.Second,
InputType: cfg.DefaultInputType,
CleanRemoved: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to all the other default configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all the ones that have default values for the golang type (including the new one)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer to have all values in here, even if they are the golang defaults. This makes it very easy (and future proof) to lookup what our defaults are without having to think about what the default in Golang is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -211,6 +211,10 @@ filebeat.prospectors:
# This is especially useful for multiline log messages which can get large.
#max_bytes: 10485760

# Maximum depth for "**" patterns. When it's set to a positive value, "**"
# patterns are expanded up to the specified depth.
#recursive_glob_depth: 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we don't want to make it configurable at first, could we go with recursive_glob.enabled as config option instead? This will allow us to extend it in the future if needed. I'm aware that just recursive_glob: true would feel more natural at first, but this will prevent us from using it as a namespace in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

### Recursive glob configuration

# Expand "**" patterns into regular glob patterns.
#recursive_glob.enabled: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally have the default value in the full config, which is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


Filebeat starts a harvester for each file that it finds under the specified
paths. You can specify one path per line. Each line begins with a dash (-).

===== recursive_glob

*`enabled`*:: Enable expanding `**` into recursive glob patterns. With this feature enabled,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mark this beta first to get some users to test it first? At the same time it rolls out with the 6 alpha release, so we get already some testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add []beta and it gets expanded with a message automatically in the docs. But lets leave it out. We can still add it.

matches, err := filepath.Glob(glob)
for _, path := range l.config.Paths {
depth := uint8(0)
if l.config.recursiveGlob.enabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected to see here recursiveGlob.Enabled() as this come with ucfg for each "object". Problem is that if we want that we would have to do unpack each time the scan runs which is not a good idea. So this looks good to me.

@@ -45,6 +46,11 @@ type prospectorConfig struct {
Module string `config:"_module_name"` // hidden option to set the module name
Fileset string `config:"_fileset_name"` // hidden option to set the fileset name
Processors processors.PluginConfig `config:"processors"`
recursiveGlob *recursiveGlobConfig `config:"recursive_glob"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

recursiveGlob        bool   `config:"recursive_glob.enabled" `

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you add a CHANGELOG entry, squash the commits and add a commit message?

Expand double wildcards into standard glob patterns, up to a maximum
depth of 8 levels after the wildcard.

Resolves #2084

Filebeat starts a harvester for each file that it finds under the specified
paths. You can specify one path per line. Each line begins with a dash (-).

===== recursive_glob

*`enabled`*:: Enable expanding `**` into recursive glob patterns. With this feature enabled,
Copy link
Contributor

@dedemorton dedemorton Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@7AC This won't render correctly in asciidoc because it will try to evaluate the asterisks before the backticks. Wherever you have asterisks in an example, use plus signs (+) instead of backticks. For example: +**+. (So replace the backticks with plus signs in the examples.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it thank you!


Filebeat starts a harvester for each file that it finds under the specified
paths. You can specify one path per line. Each line begins with a dash (-).

===== recursive_glob

*`enabled`*:: Enable expanding +**+ into recursive glob patterns. With this feature enabled,
Copy link
Contributor

@dedemorton dedemorton Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@7AC I think my explanation of the behavior of plus signs was a little off in my previous comment. The backticks are OK for the doc build, but the content doesn't render for users who preview the file in GitHub. We aren't optimizing for viewing in GitHub, though, so technically you can use the backticks. But you should be consistent in this entire description (either use backticks or plus signs for any example that contains asterisks in the path). The asterisk doesn't get evaluated either way.


Filebeat starts a harvester for each file that it finds under the specified
paths. You can specify one path per line. Each line begins with a dash (-).

===== recursive_glob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@7AC You need to add the section ID above this section or the doc build will fail. So add this to line 50:

[[recursive_glob]]

@ruflin ruflin merged commit aaee997 into elastic:master Apr 26, 2017
@mbhuie
Copy link

mbhuie commented Oct 24, 2017

Is this available in 5.6.3 release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat in progress Pull request is currently in progress. review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants