Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not stop collecting events when journal entries change #9994

Merged

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Jan 10, 2019

Previously sd_journal_wait was not used. From now on all changes to journals are detected.

I also added custom seccomp policy to Journalbeat.

Closes #9533

@kvch kvch added review needs_backport PR is waiting to be backported to other branches. Journalbeat labels Jan 10, 2019
@kvch kvch requested review from ph and andrewkroh January 10, 2019 14:59
@kvch kvch requested a review from a team as a code owner January 10, 2019 14:59
@kvch kvch changed the title Journalbeat does not stop when journal entries change Do not stop collecting events when journal entries change Jan 10, 2019
@kvch
Copy link
Contributor Author

kvch commented Jan 11, 2019

Failing tests are unrelated.

urso
urso previously requested changes Jan 11, 2019
Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we need a seccomp policy for ARM as well?

journalbeat/reader/journal.go Outdated Show resolved Hide resolved
case <-r.done:
return
default:
changesChan <- c
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like waitForChange deadlocking -> go-routine/mem leak if waitUntilNewEventOrError returns after waitForChange is started but returns before it reads from changesChan. Due to the Wait on shutdown of waitUntilNewEventOrError, this function might be deadlocked as well.

@kvch
Copy link
Contributor Author

kvch commented Jan 15, 2019

Failing tests are unrelated on Travis.

@amencarini
Copy link

amencarini commented Jan 15, 2019

Hi! I'm trying to use the changes of this PR and I'm getting a massive wall of messages such as:

2019-01-15T16:26:39.084Z ERROR [input] input/input.go:165 Error while reading event: error while waiting for event: operation not permitted {"id": "9dde1bfa-fe25-44f6-9869-1a8f7beaed99"}

I'm building the journalbeat in an ubuntu container, and deploy it in an ubuntu container. Any suggestion as to what I might be doing incorrectly? My initial suspicion was that I'm not building it correctly but I am able to build v6.5.4.

@kvch
Copy link
Contributor Author

kvch commented Jan 16, 2019

@amencarini The errors show up because Journalbeat tries to run syscalls which are not listed in the seccomp policy. As a workaround, you could set seccomp.enabled to false. Alternatively, you could pass a seccomp profile to the container: https://docs.docker.com/engine/security/seccomp/

We need to fix this problem before releasing the new official Journalbeat image.

@amencarini
Copy link

@kvch thanks for your reply! Let me know if there is there anything I can do to help on this.

@amencarini
Copy link

amencarini commented Jan 17, 2019

Me again! I tried to put a 10s sleep before error while waiting for event: operation not permitted, and journalbeat in that scenario keeps shipping logs to Elasticsearch.

2019-01-17T15:02:50.002Z	ERROR	[input]	input/input.go:166	Error while reading event: error while waiting for event: operation not permitted	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:00.002Z	INFO	[input]	input/input.go:132	journalbeat successfully published 1 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:00.003Z	INFO	[input]	input/input.go:132	journalbeat successfully published 1 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:00.003Z	INFO	[input]	input/input.go:132	journalbeat successfully published 1 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:00.003Z	INFO	[input]	input/input.go:132	journalbeat successfully published 1 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:00.003Z	INFO	[input]	input/input.go:132	journalbeat successfully published 1 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:01.022Z	INFO	[input]	input/input.go:132	journalbeat successfully published 56 events	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"}
2019-01-17T15:03:10.073Z	ERROR	[input]	input/input.go:166	Error while reading event: error while waiting for event: operation not permitted	{"id": "2f872e4b-efba-4b54-bc40-68b03123ef52"} 

So my impression is that whatever is involved in the sd_journal_wait C call is what's signalling a lack of permissions.

On the other hand, forcing the rotation of the logs via journalctl --rotate makes journalbeat stop shipping logs (only the go metrics and the operation not permitted lines get logged). Hope this helps!

@kvch
Copy link
Contributor Author

kvch commented Jan 18, 2019

Were you able to make it work using a seccomp profile?

@amencarini
Copy link

I haven't tried but I'm running this off Kubernetes and I can't find the equivalent option of security-opt. Happy to try it out if you can point me to it!


r.backoff.Reset()
return event, nil
c := r.journal.Wait(100 * time.Millisecond)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: It's not too bad to wait for a few 100 millis, and I don't think we should change it here. But I'd prefer if we wouldn't have to wait. The Wait call uses sd_journal_wait, which itself uses ppoll. One should not close a file descriptor a poll operation is active on, but an alternate wait-without-wakeup-schema could be created by wasting a second polling file descriptor:

  • create close-signaling file descriptor (e.g. create pipe, wastes 2 FDs)
  • get journald fd via sd_journal_get_fd (it's the same journald polls on)
  • poll on close-fd + journal-fd
  • for handling async shutdown trigger close-fd (e.g. send some bytes)
  • check who did send poll signal and handle accordingly

@urso urso dismissed their stale review January 20, 2019 00:19

Requested changes have been applied. We should test if PR really helps with collector stopping and in which cases it might still fail.

@kvch kvch force-pushed the fix-journalbeat-do-not-stop-collection-after-rotate branch from 129f86c to cea4c4b Compare January 25, 2019 15:30
@kvch
Copy link
Contributor Author

kvch commented Jan 28, 2019

I have just tested the seccomp policy for ARM in docker image balenalib/raspberrypi3-debian-golang and balenalib/raspberry-pi-debian-golang. It works as expected.

@kvch kvch added v6.7.0 and removed needs_backport PR is waiting to be backported to other branches. labels Feb 1, 2019
kvch added a commit to kvch/beats that referenced this pull request Feb 1, 2019
Previously sd_journal_wait was not used. From now on all changes to journals are detected.

I also added custom seccomp policy to Journalbeat.

Closes elastic#9533
(cherry picked from commit cead4b6)
kvch added a commit that referenced this pull request Feb 4, 2019
…entries change (#10485)

Cherry-pick of PR #9994 to 6.x branch. Original message:

Previously sd_journal_wait was not used. From now on all changes to journals are detected.

I also added custom seccomp policy to Journalbeat.

Closes #9533
DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
Previously sd_journal_wait was not used. From now on all changes to journals are detected.

I also added custom seccomp policy to Journalbeat.

Closes elastic#9533
DStape pushed a commit to DStape/beats that referenced this pull request Aug 20, 2019
…ournal entries change (elastic#10485)

Cherry-pick of PR elastic#9994 to 6.x branch. Original message:

Previously sd_journal_wait was not used. From now on all changes to journals are detected.

I also added custom seccomp policy to Journalbeat.

Closes elastic#9533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants