Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tag "truncated" to "log.flags" if incoming line is longer than configured limit #7991

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Aug 16, 2018

A new field is added to store the flags of an event named "log.flags".
If a message is truncated, "truncated" flag is added to the list.

Example event with "truncated" flag:

{
  "@timestamp": "2018-08-16T13:00:46.759Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "host": {
    "name": "sleipnir"
  },
  "source": "/home/n/test.log",
  "offset": 33,
  "log": {
    "flags": [
       "truncated"
    ],
  },
  "message": "test line",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  }
}

Blocks #7997
Closes #7022

@kvch kvch added review Filebeat Filebeat needs_backport PR is waiting to be backported to other branches. labels Aug 16, 2018
@kvch kvch requested review from ph and urso August 16, 2018 13:39
@ruflin
Copy link
Member

ruflin commented Aug 17, 2018

I like the idea of having this flag. Could we put it under log.truncated? Alternative suggestions? Looking at the current event it looks like it would more fit on the root level but with the change to ECS we will have very few top level fields.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to select "request changes" to make it does not get merged before we discussed the naming of the field.

@kvch kvch changed the title Add field "truncated" if incoming line is too long Add field "log.truncated" if incoming line is too long Aug 17, 2018
@kvch
Copy link
Contributor Author

kvch commented Aug 17, 2018

I am fine with log.truncated. I renamed the occurrences and refactored the code accordingly.

@@ -108,6 +108,11 @@
description: >
Logging level.

- name: log.truncated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default type is keyword. I assume this needs to be integer/long?

I like that it's not just a bool but a value that tells how much was deleted.

@@ -48,6 +49,7 @@ type Reader struct {
separator []byte
last []byte
numLines int
truncated int
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't think about multiline being able to capture truncated bytes as well. Good catch 👍

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When truncating, the multiline reader and limit reader must add the number of truncated bytes to the current value (in case a former phase did truncate the contents before).

This makes me wonder if we really want to count the bytes, or rather just want to add a tag to the event.

"log": common.MapStr{
"truncated": diff,
},
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the contents is already truncated (multiline is applied before limit reader)? We should add the counters, no?

@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

The implementation of counting the truncated bytes is correct with the current fixed pipeline.
Multiline is the first reader which might truncate a line. Thus, it does not require checking if a line's been truncated. LimitReader is the last element of the pipeline and it's able to truncate a line. If a line has been truncated by the multiline reader, it means that LimitReader has nothing to do. If multiline is not part of the pipeline than LimitReader is the first one with the ability to truncate. Thus, it does not need to check whether a reader truncated the message before.

However, it is not future-proof. When the reader pipeline refactoring is done, and the pipeline is flexible, more checking is required. I would not add the overhead of getting a value from a common.MapStr on every event. It's not worth it. As a wild alternative Truncated could be added to reader.Message. But I don't see that happening either.

Thus, I changed the field value to true. Also, I renamed the field to event.truncated, because of the problem described in #7997 (comment)

@kvch kvch changed the title Add field "log.truncated" if incoming line is too long Add field "event.truncated" if incoming line is too long Aug 21, 2018
@urso
Copy link

urso commented Aug 21, 2018

@kvch @ruflin Not sure about the naming: log.truncated, event.truncated, log.message.truncated? In the end I'd prefer to be ecs compatible. WDYT @ruflin

Now that we changed it to a boolean, why do use a field instead of defining and inserting a tag. Please note, this is a more general question: Do we prefer boolean fields or tags?

@ruflin
Copy link
Member

ruflin commented Aug 21, 2018

Having the lenght of truncated bytes would have been a nice feature but agree that it can make implementation more complex, so having simple bool could be sufficient. For the naming I would also like to push for log.truncated as log.* should contain all meta data around log messages.

For the tags: Definitively an interesting idea and I think Logstash uses tags for some things. At the same time I quite like specific fields that "describe" what happened and tags are more for users to add their own "free flow" meta information.

},
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mlr.truncate > 0 branch do in the worst case 4 allocations on every run.
I would define the common.MapStr in a package variable that I would reuse.

"truncated": true,
},
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above for allocations.

@ph
Copy link
Contributor

ph commented Aug 21, 2018

@kvch really minor comments, good catch on multiline!

@kvch
Copy link
Contributor Author

kvch commented Aug 21, 2018

I will name this field/tag log.truncated as it turns out that the problem in my other PR was caused by an outdated end-to-end test. But first let's decide if it's a field or tag.

@kvch kvch force-pushed the feature/filebeat/add-number-of-truncated-bytes-if-too-long branch from 702fa8c to b16cff8 Compare August 27, 2018 17:14
@kvch kvch changed the title Add field "event.truncated" if incoming line is too long Add tag "truncated" to "log.status" if incoming line is longer than configured limit Aug 27, 2018
@@ -59,3 +59,15 @@ func (msg *Message) AddFields(fields common.MapStr) {
}
msg.Fields.Update(fields)
}

func (msg *Message) AddTagsWithKey(key string, tags []string) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Message.AddTagsWithKey should have comment or be unexported
receiver name msg should be consistent with previous receiver name m for Message


// AddTagsWithKey adds tags to the message with an arbitrary key.
// If the field does not exist, it is created.
func (msg *Message) AddTagsWithKey(key string, tags []string) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receiver name msg should be consistent with previous receiver name m for Message

@kvch kvch force-pushed the feature/filebeat/add-number-of-truncated-bytes-if-too-long branch from 4706510 to a01440c Compare August 28, 2018 15:08
Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, I have added a remark about variadic function I think this would be a better API considering the usage we will make.

@ruflin are you OK with log.status? I think this field could have the dissect_parse_failure

}

// AddTagsWithKey adds tags to the message with an arbitrary key.
// If the field does not exist, it is created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make AddTagsWithKey takes a variadic arguments for tags?
Common use case its probably to add a single tag.

AddTagsWithKey(key string, tags ...[]string) error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name it in the code the same as it's in the event again. If we stay with status this should be AddStatusWithKey. Alternative as proposed above we could call it flags so it would be AddFlagsWithKey. Or if we stick with tags here it should probably be log.tags?

Would AddFlagWithKey(key string, flag string) error be enough here? It seems all the usage we have below is only adding 1 flag.

@urso
Copy link

urso commented Aug 28, 2018

I like log.status :)

Please add AddTagsWithKeys to the developer docs. Everything MapStr related is very much used/required by all beats (including community beats).

@@ -108,6 +108,10 @@
description: >
Logging level.

- name: log.status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling it log.flags? I somehow expect a different content for status like unprocessed or processed. flags is inspired by your field description and I kind of like it because we flagged a certain event.

Alternative would be log.tags?

}

// AddTagsWithKey adds tags to the message with an arbitrary key.
// If the field does not exist, it is created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name it in the code the same as it's in the event again. If we stay with status this should be AddStatusWithKey. Alternative as proposed above we could call it flags so it would be AddFlagsWithKey. Or if we stick with tags here it should probably be log.tags?

Would AddFlagWithKey(key string, flag string) error be enough here? It seems all the usage we have below is only adding 1 flag.

if maxBytesReached && maxLinesReached {
if space < 0 || space > len(m.Content) {
truncated = space - len(m.Content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if space < 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I think it was a leftover. This value of truncated was never used, because it was only added if len(m.Content) was bigger than space. However, in this branch space equals len(m.Content), so the truncated value here was unnecessary.

}

for _, message := range messages {
fmt.Println(message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

func TestMultilineAfterTruncated(t *testing.T) {
pattern := match.MustCompile(`^[ ]`) // next line is indented a space
maxLines := 2
testMultilineTruncated(t,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test where the truncated flag should be misssing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

found := false
switch flags := statusFlags.(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can it be both types? Same question for the other test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it to mimic the testing of the old AddTags. But as the input is never interface, I am removing it.

maxBytes int
}{
{"long-long-line", 5},
{"long-long-line", 3},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a third param here truncated: bool and then add also a test that is not truncated and checks that the flag does not exist?

// AddTagsWithKey appends a tag to the key field of ms. If the field does not
// exist then it will be created. If the field exists and is not a []string
// then an error will be returned. It does not deduplicate the list.
func AddTagsWithKey(ms MapStr, key string, tags []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for the AddTagsWithKey method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// AddTagsWithKey appends a tag to the key field of ms. If the field does not
// exist then it will be created. If the field exists and is not a []string
// then an error will be returned. It does not deduplicate the list.
func AddTagsWithKey(ms MapStr, key string, tags []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I would keep the naming here as Tags even though I would rename everything in filebeat code above to Flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Done.

if !exists {
ms[TagsKey] = tags

k, subMap, oldTags, present, err := mapFind(key, ms, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wodner if this has a perfomance impact on the old AddTags method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Keys without dots can be processed in the first loop on a fast path in mapFind. If the key exists as is, it's simply returned at the beginning.

// Fast path, key is present as is.
if v, exists := data[key]; exists {
    return key, data, v, true, nil
}

If not, the only additional cost is one IndexRune which looks for a dot in the key. But currently old tags don't have dots in their keys. Thus, the function returns at the second possible point without doing anything "expensive" e.g. creating additional submaps.

idx := strings.IndexRune(key, '.')
if idx < 0 {
    return key, data, nil, false, nil
}

m.Fields.Update(fields)
}

// AddTagsWithKey adds tags to the message with an arbitrary key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported method Message.AddFlagsWithKey should be of the form "AddFlagsWithKey ..."

@kvch kvch changed the title Add tag "truncated" to "log.status" if incoming line is longer than configured limit Add tag "truncated" to "log.flags" if incoming line is longer than configured limit Aug 30, 2018
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please make sure to update the PR description.

@kvch kvch merged commit 0884236 into elastic:master Aug 30, 2018
@kvch kvch added v6.5.0 and removed needs_backport PR is waiting to be backported to other branches. labels Aug 30, 2018
kvch added a commit to kvch/beats that referenced this pull request Aug 31, 2018
…nfigured limit (elastic#7991)

A new field is added to store the flags of an event named "log.flags".
If a message is truncated, "truncated" flag is added to the list.

Example event with "truncated" flag:

{
  "@timestamp": "2018-08-16T13:00:46.759Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "host": {
    "name": "sleipnir"
  },
  "source": "/home/n/test.log",
  "offset": 33,
  "log": {
    "flags": [
       "truncated"
    ],
  },
  "message": "test line",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  }
}

Closes elastic#7022
(cherry picked from commit 0884236)
kvch added a commit that referenced this pull request Sep 3, 2018
…ing line is longer than configured limit (#8165)

* Add tag "truncated" to "log.flags" if incoming line is longer than configured limit (#7991)

A new field is added to store the flags of an event named "log.flags".
If a message is truncated, "truncated" flag is added to the list.

Example event with "truncated" flag:

{
  "@timestamp": "2018-08-16T13:00:46.759Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "7.0.0-alpha1"
  },
  "host": {
    "name": "sleipnir"
  },
  "source": "/home/n/test.log",
  "offset": 33,
  "log": {
    "flags": [
       "truncated"
    ],
  },
  "message": "test line",
  "prospector": {
    "type": "log"
  },
  "input": {
    "type": "log"
  },
  "beat": {
    "hostname": "sleipnir",
    "version": "7.0.0-alpha1",
    "name": "sleipnir"
  }
}

Closes #7022
(cherry picked from commit 0884236)

* fix changelog && rebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants