Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Unlogged error: the reports:monitor task found an expired processing job #125996

Closed
tsullivan opened this issue Feb 17, 2022 · 1 comment · Fixed by #126737
Closed
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort

Comments

@tsullivan
Copy link
Member

tsullivan commented Feb 17, 2022

We sometimes get reports from users involving a report job that takes a very long time to fail with a timeout, with no errors logged and no explanation why.

If Kibana restarts or crashes during report job execution, that job will remain labeled as processing. To clean up that state, the reports:monitor task queries for processing reports that started at a time further back than the timeout limit allows. The report is rescheduled for reattempt.

If Kibana keeps restarting / crashing / going unresponsive while a report job is running, it will keep getting rescheduled until the number of attempts is exhausted. When a delayed processing job is found and is marked with no remaining attempts, Reporting marks the job as failed.

Problem: The rescheduling events may not be very apparent even to someone reading the logs, since these are info or debug level debugs. One would normally be able to see this kind of activity if they are searching for errors in the reporting logs, since they are not harmless events.

The reports:monitor task should log an error when it finds a task that needs to be retried. The error message should explain what happened for things to get into that state: the instance that was executing the report job stopped responding for too long.

@tsullivan tsullivan added the bug Fixes for quality problems that affect the customer experience label Feb 17, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Feb 17, 2022
@tsullivan tsullivan added (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:AppServicesUx labels Feb 17, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Feb 17, 2022
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:small Small Level of Effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants