Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(event): use fallback for accepted time #2335

Conversation

mfocko
Copy link
Member

@mfocko mfocko commented Feb 7, 2024

When submitting failed Copr builds to the Pushgateway, we're using time of accepting the task, in some cases we don't have time of accepting, so use ‹created_at› as a fallback.

Fixes #2258

Not sure about this change, since for the last occurrence in Sentry the difference between the reported time of failure and created_at is 5 minutes. I can the ‹task_accepted_time› only in DB, but this is happening even before the build starts, cause it's coming from an issue caused by submitting the build, so I would say that we should use the datetime.now()? But OTOH it doesn't feel like it's skipping the Celere queue and being handled right away.

@mfocko mfocko self-assigned this Feb 7, 2024
Copy link
Contributor

Copy link
Member

@majamassarini majamassarini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first it was wired to see different "kind" of timestamps used to initialize the "task_accepted_time" field. But I think we use this field just to show it to the user, and the difference in this kind of timestamps is so minimal that no one will ever notice it.

So for me it looks good, maybe I would just add a note. 🙏

@mfocko
Copy link
Member Author

mfocko commented Feb 8, 2024

@majamassarini The time difference is used for the metrics and SLOs, 5 minutes is exactly the range we have as an SLO, the thing is that these errors are occurring during the submission, so there is no “status set after pass/fail of the Copr build”?

I'm kind of inclined to do datetime.now() if there is no task_accepted_time

@lbarcziova
Copy link
Member

I can the ‹task_accepted_time› only in DB, but this is happening even before the build starts, cause it's coming from an issue caused by submitting the build, so I would say that we should use the datetime.now()

Sorry, I am still not following, why do these miss the information about task_accepted_time?

@majamassarini
Copy link
Member

majamassarini commented Feb 9, 2024

@majamassarini The time difference is used for the metrics and SLOs, 5 minutes is exactly the range we have as an SLO, the thing is that these errors are occurring during the submission, so there is no “status set after pass/fail of the Copr build”?

I'm kind of inclined to do datetime.now() if there is no task_accepted_time

Oh, I have forgot about the SLOs 🙈
And now probably I am following; if any kind of errors occurs that prevents us to update the status checks then the task_accepted_time remains with the initial value of None, and I think this is ok because we never really accepted the task (from the user perspective which we are measuring here).
I am wondering if we can't deal with the exception of having a None value there? We don't have to calculate a difference in this case, I think it could be another metric, we are not simply slow but an error has occurred.

@mfocko
Copy link
Member Author

mfocko commented Feb 9, 2024

So do we want to count them in with being an immediate failure, or just drop them from metrics? I'm kinda inclined to count them with time difference of datetime.now() - datetime.now()

  • 👎 for dropping
  • 👍 for counting

Vote on your devices :) :D

When submitting failed Copr builds to the Pushgateway, we're using time
of accepting the task, in some cases we don't have time of accepting, so
use ‹created_at› as a fallback.

Fixes packit#2258

Signed-off-by: Matej Focko <mfocko@redhat.com>
@mfocko mfocko force-pushed the fix/copr-missing-timestamp-on-failure branch from c193b57 to 7e82c64 Compare February 11, 2024 21:05
@mfocko mfocko added the mergeit When set, zuul wil gate and merge the PR. label Feb 11, 2024
Copy link
Contributor

Copy link
Contributor

Build succeeded (gate pipeline).
https://softwarefactory-project.io/zuul/t/packit-service/buildset/368f114b4f7340dea629f6f88303164a

✔️ pre-commit SUCCESS in 1m 57s

@softwarefactory-project-zuul softwarefactory-project-zuul bot merged commit 3494bcd into packit:main Feb 11, 2024
3 checks passed
@mfocko mfocko deleted the fix/copr-missing-timestamp-on-failure branch February 12, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergeit When set, zuul wil gate and merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'NoneType' object has no attribute 'tzinfo'
3 participants