Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] S3 input stop ingesting logs after some time #15502

Closed
ynirk opened this issue Jan 13, 2020 · 2 comments · Fixed by #15590
Closed

[Filebeat] S3 input stop ingesting logs after some time #15502

ynirk opened this issue Jan 13, 2020 · 2 comments · Fixed by #15590
Assignees
Labels
bug Filebeat Filebeat Team:Integrations Label for the Integrations team

Comments

@ynirk
Copy link

ynirk commented Jan 13, 2020

  • Version: 7.5.1

  • Operating System: official Docker image on Ubuntu

  • Description

I'm using the S3 input to ingest AWS vpcflow logs. It's working fine for a while and then it stops (the process has been reproduced several times)

The only message i see in the log is the single occurence of:

020-01-12T11:59:26.182Z	ERROR	[s3]	s3/input.go:259	handleS3Objects failed: ReadString failed for AWSLogs/xxx/vpcflowlogs/xxx/2020/01/09/object.log.gz: read tcp 172.17.0.1:48188->52.95.166.14:443: read: connection reset by peer
2020-01-12T11:59:26.182Z	WARN	[s3]	s3/input.go:272	Processing message failed: handleS3Objects failed: ReadString failed for AWSLogs/xxx/vpcflowlogs/xxx/2020/01/09/object.log.gz: read tcp 172.17.0.1:48188->52.95.166.14:443: read: connection reset by peer
2020-01-12T11:59:26.197Z	WARN	[s3]	s3/input.go:277	Message visibility timeout updated to 300

And then

INFO	[s3]	s3/input.go:297	Message visibility timeout updated to 300

repeated several times

After this sequence, logs are not ingested anymore
Screenshot 2020-01-12 at 13 06 17

  • Steps to reproduce:
  1. configure filebeat with a S3 input.
    The SQS queue i'm using already has a lot of messages (~5k)

  2. wait some time to see the log event and logs stop being ingested (usually ~2Hrs in my case)

@kaiyan-sheng
Copy link
Contributor

Thank you for all the details and the log file. Seems like after the read: connection reset by peer error been triggered from reader.ReadString function, then processorKeepAlive found it's taking too long to run processMessage, which is longer than half of the set visibilityTimeout. So changeVisibilityTimeout function keep getting called repeatedly.

@kaiyan-sheng
Copy link
Contributor

kaiyan-sheng commented Jan 14, 2020

After more investigation: this error seems to show up when there is high concurrency downloads going on from S3 bucket or with large size of files from S3 or with small network bandwidth. Adding a timeout when calling GetObject API will help and I will work on a PR to fix this.

Potentially related issue I found: aws/aws-sdk-go#1763

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants