progress tracking for keep alive reply #64
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
context:
When receiving a keep-alive request from Postgres, replying with Posgres' current
wal_end + 1
notifies the Postgres server that we fully processed up-to its
wal_end+1
.issue:
As Replication.Server communicates with Replication.Publisher asynchronously this means that,
if an error occurs during the processing of a message,
we can acknowledged many more messages than we actually processed.
For a durable slot, this means that when the durable slot will restart, it will do so at the last
wal_end+1
we replied with (loosing events).
The longer the transaction takes to be fully processed (many records, etc.) the higher the risk of this happening
solution proposed:
Adding a Replication.Progress Agent that stores in a :gb_sets (ordered set) the LSN of transactions.
When we start receiving the transaction it we push it
We drops it when the processing is done.
In the Replication.Server, we then only need to get the wal_end of the smallest LSN in progress as keep-alive reply.
If no transaction is in progress we can return the received wal_end+1 instead as currently.