BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057

reductionista · 2024-08-07T02:53:56Z

Addresses this bug: https://smartcontract-it.atlassian.net/browse/BCI-3492

Previously, rows affected was only getting reported for log & block pruning when there was an error with the query, so it was always 0

Always report rows affected for log & block pruning
Change behavior of DeleteExpiredLogs to delete logs which don't match any filter
Add a test case to ensure the dataset size is published properly during pruning

Also: - Change behavior of DeleteExpiredLogs to delete logs which don't match any filter - Add a test case to ensure the dataset size is published properly during pruning

mateusz-sekara · 2024-08-07T15:54:14Z

core/chains/evm/logpoller/observability.go

@@ -285,7 +285,7 @@ func withObservedExecAndRowsAffected(o *ObservedORM, queryName string, queryType
 		WithLabelValues(o.chainId, queryName, string(queryType)).
 		Observe(float64(time.Since(queryStarted)))

-	if err != nil {


Nice catch!

mateusz-sekara · 2024-08-07T16:16:07Z

core/chains/evm/logpoller/orm.go

+// DeleteExpiredLogs removes any logs which either:
+//   - don't match any currently registered filters, or
+//   - have a timestamp older than any matching filter's retention, UNLESS there is at
+//     least one matching filter with retention=0
 func (o *DSORM) DeleteExpiredLogs(ctx context.Context, limit int64) (int64, error) {


Is it possible to accidentally remove logs when job is updated? Let's imagine the following scenario

We have a job, that registered multiple LogPoller filters during the init

CLO proposes a job replacement, a slight change to the job definition, but contracts stay the same so filters would be the same, so the logs required for plugin to work

NOP accepts the job which triggers removing the old job and creating a new one. It includes unregistering old filters and registering new ones

In the meantime LogPoller PruneExpiredLogs kicks in trying to remove stale logs. It sees a new set of "abandoned" logs, because filters were just removed. It removes these logs immediately.

Plugin init registers new filters (which are the same), but logs are gone so we might need to run replay

I wonder if this is something that could happen or impact the system. How does it work for other products? What if job is removed and added in a two way process? (via API). AFAIK CLO does this entire flow in a large single db transaction, but I guess it might change at some point because of the loopps architecture, right?

Good question! I'm pretty sure it's okay for now, since as you say CLO does the whole flow in a single db transaction. Someone could manually remove a job, and then add a similar one later... but I think it would be a natural user expectation to assume that there are no guarantees about events the old job cared about still being there for the new job to use. If CLO changes so that it's no longer in a single db transaction, that seems more problematic since it's presented to the user as an "Update" rather than a "Remove" followed by a separate "Create". But I think this would cause similar problems with or without this change. One way or another, we'll have to solve some issues if we can no longer guarantee the ability for CLO to make atomic job updates.

Plugin init registers new filters (which are the same), but logs are gone so we might need to run replay

Any plugin which cares about past logs already does a replay to the appropriate block when it's initialized. (For example, going back to pick up an OCR2 configuration event.) So I think the only issue is with missing logs that get emitted around the time the old job is getting swapped out with the new job. But even without this change, there is a potential for missing some logs while that's happening. This would just increase the likelihood of it happening, since it would apply not just to events emitted between the delete and create operations but also those emitted shortly before the delete where the old job hadn't had time to process them before being deleted.

One way I can think of for how we might be able to solve this issue is with ChainReader. If it can expose a method for updating a filter that does UnregisterFilter and then RegisterFilter within the same db transaction, that should solve it I think. We might already even be doing this with Bind(), I'm not sure but @ilija42 should know.

Sounds good. One thing that comes to my mind that might work as an additional safety check is to wait for some "sane" threshold before deleting logs without the filter. When retention is null we match only logs older than X days. Not sure what that threshold should be (maybe configurable?), but that might prevent us from accidentally losing logs right after deleting the job in some weird way. WDYT?

Yes, I was thinking that might be what we have to do, if we can't solve things with ChainReader.

Actually, I just thought of a simple solution which I think would be pretty robust: in UnregisterFilter(), we could return successful but delay the actual removal of the filter by 1 hour (or any long enough time, it could even be 24 hours but that seems like overkill). As long as any call to RegisterFilter() arrives within that 1 hour grace period that matches the logs that would have been deleted, there is no time at which the pruner would be allowed to delete them.

cl-sonarqube-production · 2024-08-08T15:54:21Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

* develop: CRIB CI integration (#13924) fix: refactor sonarqube scan args (#13875) BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected (#14057) Add error mapping for Astar (#13990) [BCI-3862][chainlink] - Change DSL Block primitive to string instead of int (#14033) [KS-412] Validate called DON membership in TriggerPublisher (#14040) [TT-1429] notifying guardian on some failures (#14073) Add Mantle errors (#14053) Fix write target nil dereferences (#14059) Allow retrying failed transmissions (#14017) auto-11214: migrate more tests to foundry (#13934)

Fix withObservedExecAndRowsAffected

0153d63

Also: - Change behavior of DeleteExpiredLogs to delete logs which don't match any filter - Add a test case to ensure the dataset size is published properly during pruning

reductionista requested a review from a team as a code owner August 7, 2024 02:53

reductionista temporarily deployed to sdlc August 7, 2024 02:54 — with GitHub Actions Inactive

reductionista marked this pull request as draft August 7, 2024 04:24

reductionista marked this pull request as ready for review August 7, 2024 04:26

pnpm changeset

acccf34

reductionista requested a review from a team as a code owner August 7, 2024 04:39

reductionista temporarily deployed to sdlc August 7, 2024 04:39 — with GitHub Actions Inactive

dhaidashenko previously approved these changes Aug 7, 2024

View reviewed changes

changeset #fix -> #bugfix

32b86ec

reductionista dismissed dhaidashenko’s stale review via 32b86ec August 7, 2024 15:22

reductionista temporarily deployed to sdlc August 7, 2024 15:22 — with GitHub Actions Inactive

Merge branch 'develop' into BCI-3492-fix/logpoller-observe-rowsaffected

e98677b

reductionista temporarily deployed to sdlc August 7, 2024 15:23 — with GitHub Actions Inactive

mateusz-sekara reviewed Aug 7, 2024

View reviewed changes

mateusz-sekara approved these changes Aug 8, 2024

View reviewed changes

Merge branch 'develop' into BCI-3492-fix/logpoller-observe-rowsaffected

ee0faa5

reductionista temporarily deployed to sdlc August 8, 2024 15:38 — with GitHub Actions Inactive

dhaidashenko approved these changes Aug 8, 2024

View reviewed changes

reductionista added this pull request to the merge queue Aug 8, 2024

Merged via the queue into develop with commit e0850a6 Aug 8, 2024
120 checks passed

reductionista deleted the BCI-3492-fix/logpoller-observe-rowsaffected branch August 8, 2024 18:11

github-actions bot mentioned this pull request Aug 8, 2024

[DO NOT MERGE] Changeset Release Preview - v2.18.0 #13148

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057

BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057

reductionista commented Aug 7, 2024

mateusz-sekara Aug 7, 2024

mateusz-sekara Aug 7, 2024

reductionista Aug 7, 2024 •

edited

Loading

mateusz-sekara Aug 8, 2024

reductionista Aug 8, 2024

reductionista Aug 8, 2024

cl-sonarqube-production bot commented Aug 8, 2024

BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057

BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057

Conversation

reductionista commented Aug 7, 2024

mateusz-sekara Aug 7, 2024

Choose a reason for hiding this comment

mateusz-sekara Aug 7, 2024

Choose a reason for hiding this comment

reductionista Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

mateusz-sekara Aug 8, 2024

Choose a reason for hiding this comment

reductionista Aug 8, 2024

Choose a reason for hiding this comment

reductionista Aug 8, 2024

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Aug 8, 2024

Quality Gate passed

reductionista Aug 7, 2024 •

edited

Loading