-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: persist commit index in LogStore to accelerate recovery #613
base: main
Are you sure you want to change the base?
Enhancement: persist commit index in LogStore to accelerate recovery #613
Conversation
Proposal for next PR: Lines 1292 to 1359 in 42d3446
func (r *Raft) processLogs(index uint64, futures map[uint64]*logFuture) {
// Reject logs we've applied already
lastApplied := r.getLastApplied()
if index <= lastApplied {
r.logger.Warn("skipping application of old log", "index", index)
return
}
+ if r.fastRecovery && isCommitTrackingLogStore(r.logs) {
+ store := r.logs.(CommitTrackingLogStore)
+ if err = store.SetCommitIndex(index) {
+ // show some error msg
+ }
+ }
....
// Update the lastApplied index and term
r.setLastApplied(index)
} |
As a long-term user of this library, this could be useful. However I would strongly recommend that this functionality be wrapped in a flag, which is settable in the Raft Config object (similar to |
@otoolep Thanks for the suggestion, I would add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting on this @lalalalatt.
The interface looks good. I have a comment inline and given that I'm suggesting removing half the lines in this PR, it might make sense to also add the change you proposed for your "next PR" into this one?
I also think it might be worth rebasing this PR on to a feature branch like f/fast-recovery
so we can keep PRs small to review but not merge code into main
until it's a working feature that can be released. (I realise this would be a no-op, but still if we don't complete the feature it will be unused code that will need later cleanup or completion.)
What do you think of that approach?
@banks Sure, that sounds like a good plan. Could you help me create the |
Drive-by comment.
That's not the right way to think about the development flow (not unless this repo is managed in some unusual way). You create the branch in your own fork of this repo, and then generate a PR from that branch in your fork to the main branch in this repo. |
@otoolep in general that is usually how GH works. In this case I wonder if we should have a long running branch here so that we can keep PRs small and review the changes in small parts rather than wait until the entire feature is built and have to review it all in one huge PR from the fork to here. One way to do that would be for me to create a long-lived feature branch here, another would be for us to review PRs in a fork but that leaves all the interim review in a separate place 🤔 . On reflection. I'm not sure what is gained by trying to make this many small PRs yet. Let's continue work here and see how large this PR gets as more of the code is built before we worry too much about making feature branches. Sorry for the suggestion that wasn't very well thought through! |
Ok, looks good! |
- Introduced a `fastRecovery` flag in the Raft structure and configuration to enable fast recovery mode. - Updated `NewRaft` to initialize `fastRecovery` from the configuration. - Added `persistCommitIndex` function to store the commit index when fast recovery is enabled. - Modified `processLogs` to persist the commit index before updating `lastApplied`. - Documented the `FastRecovery` option in the config.
- Implemented `recoverFromCommitedLogs` function to recover the Raft node from committed logs. - If `fastRecovery` is enabled and the log store implements `CommitTrackingLogStore`, the commit index is read from the store, avoiding the need to replay logs. - Logs between the last applied and commit index are fed into the FSM for faster recovery.
…ency - Refactor `ReadCommitIndex` to `GetCommitIndex` across `LogStore` and `InmemCommitTrackingStore`. - Introduce `InmemCommitTrackingStore` to track commit index in memory for testing purposes. - Add locking mechanism to safely read/write commit index in `InmemCommitTrackingStore`.
CommitTrackingLogStore
and its checker in LogStoreThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lalalalatt, thanks for working on this.
I spotted a couple of things that we should tweak here. I've added comments inline - let me know if anything is confusing. I didn't do a "full" final review yet but these should get it more inline with my original design. Feel free to explain if I'm misunderstanding anything though!
@banks Thanks for these lots of comments 😄 But, for the current change, it is possible to apply same configurations twice, although its wouldn't violate the correctness, it still be wasteful. |
I don't think storing commit index every time before For followers, when they are appending the log received from leader, the commit index may not increase, because leader may not receive majority agreement on the corresponding logs. For leader, as I mention previously, it persist logs by calling My idea is: the commit index only update at two place:
They both trigger Maybe that's my misunderstanding 😅, if there is anything wrong, please correct me~ |
Hi @peterxcli The idea behind persisting commit index during StoreLogs is to make sure it doesn't cause worse performance. Writing to disk is the slowest part of raft in most implementations. If we add a new write to disk anywhere then performance will decrease and that's not acceptable for our products at least and probably not to other users of this library. So the whole design was intended to persist the current known commit index along with the logs every time we write new logs - yes it's a few extra bytes but it's the
This is true, but the If we only call this during
You're correct that all the logs we store when we call Again this is so that we can "re-use" the fsync on the Most of the same arguments as above apply here: if we wait until As an aside, it we were OK with adding more disk writes, I'd not have proposed this as an extension of Does that make sense? |
@banks oh~~ It seems like I completely misunderstood your design.
Got it, thanks~ 😄
That's true 😅 @banks Thanks for your clarification very much~ |
…ntil the next StoreLogs call and then write it out on disk along with the log data.
- Implemented commit tracking logs in the Raft cluster configuration. - Added tests for restoring snapshots on startup with commit tracking logs. - Introduced a fast recovery test to ensure logs are applied correctly after restart. - Updated `MakeClusterOpts` to include `CommitTrackingLogs` option.
Hi @banks, I’ve just added some tests for the fast recovery feature, and everything seems to be working well. I’d really appreciate any feedback or advice you might have. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Peter,
This is looking awesome, thanks for your hard work!
I have a couple of suggestions inline. I'm not super hard line on any of them if you think there's a good reason to reject so love to hear your thoughts but I'd lean towards them being an improvement that will help us maintain or reason about this in the future!
Updated the function name from `persistCommitIndex` to `tryPersistCommitIndex` to better reflect its behavior. This function now updates the commit index in the persistent store only if fast recovery is enabled and if the log store implements `CommitTrackingLogStore`. Adjusted references to this function in `dispatchLogs` and `appendEntries`.
…ence - Rename SetCommitIndex to StagCommitIndex in CommitTrackingLogStore interface - Add detailed documentation for StagCommitIndex method - Update raft.go to use StagCommitIndex instead of SetCommitIndex - Optimize commit index staging in appendEntries using min(lastNewIndex, leaderCommitIndex) This change ensures commit index updates are only persisted atomically with the following StoreLogs call, preventing inconsistencies between the commit index and log entries in case of crashes. It also optimizes the staged commit index to be as up-to-date as possible without exceeding the last new entry index, reducing commit index lag in the CommitTrackingStore. The contract for implementations is clarified, specifying that GetCommitIndex must never return a value higher than the last index in the log.
@banks Thanks for the advice of the commit index staging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking awesome Peter!
Thanks again. There is one suggested minor improvement in being more defensive which is maybe questionable but I think just about seems worth it to me. There's also a typo in the new interface name and the testing implementation didn't update with it so a couple of minor fixes and it's hopefully good to go!
I think once we have those fixed up, I'll try and get some other reviewers internally to look before we make a merge decision. We'll may also look to get the BoltDB and WAL implementations PRed so we can test the whole thing end-to-end before we merge any of these to be totally sure it actually works as expected.
} | ||
assert.LessOrEqual(t, snap.Index, commitIdx) | ||
assert.LessOrEqual(t, commitIdx, lastIdx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these tests are failing right now because the InmemStore wasn't updated to match the new interface right? If they are passing for you then it might be worth a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It passes😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @banks
First I want to thanks that you reviewed so many changes and gave lots of advices and opinions. 😁
Back to this topic, I think this test case is ok🤔.
Because basically the commit index stored in the store would always lower than the last log index in store (there is always one StoreLogs
call lag).
As the comment "Expect: snap.Index --- commitIdx --- lastIdx" leaved, I think we can't sure what is the exact position of the commitIndex
in every test, so I just test the interval only.
But now we find this would lead to another problem - we can't even detect the commit store interface and its implementation aren't match😱.
We can simply solve that detection issue by changing assert.LessOrEqual
to assert.Less
, but that would another flaky problem because of the uncertainty of the commit index.
What do you think?
- Rename SetCommitIndex to StageCommitIndex in InmemCommitTrackingStore - Fix typo in CommitTrackingLogStore interface (StagCommitIndex to StageCommitIndex) - Update error message in tryStageCommitIndex for consistency - Ensure CommitTrackingLogStore extends LogStore interface
…idation - Add error logging and panic for critical errors during recovery - Replace lastApplied check with lastIndex comparison - Ensure commitIndex does not exceed lastIndex - Improve code structure and readability
#549