raft: making check quorum behaves more like a leader lease #5451

swingbach · 2016-05-25T14:44:26Z

According to the thesis 6.4, there's a way to serve read-only query efficiently without going through raft logs. but it still needs to communicate with the quorum which is not efficient enough. introducing leader lease mechanism are gonna make it more efficient.

But leader lease breaks leaderTransfer case, I assume the one who send the leaderTransfer message are supposed to know exactly what he's doing. so leader lease shouldn't break leaderTransfer. I'm gonna fix it on the next pr.

xiang90 · 2016-05-25T16:34:16Z

raft/raft.go

@@ -565,6 +565,12 @@ func (r *raft) Step(m pb.Message) error {
 	case m.Term > r.Term:
 		lead := m.From
 		if m.Type == pb.MsgVote {
+			if r.state == StateFollower && r.checkQuorum && r.electionElapsed < r.electionTimeout {


I think to correctly implement this, we also need pre-vote support.

Or the peer with a higher term will keep on sending vote request. Leader cannot make that peer stable since that peer will always reject the append requests from leader since it has higher term.

Pre-vote can help with this issue since a peer will not increase its term unless it knows there is no leader in the recent election timeout.

Another issue would be peer starting. If a peer get restarted, it will reset the election timeout and cannot vote for the a few seconds or so. If the entire cluster is restarted or is bootstrapped for the first time, there will be a few seconds unavailability.

Why is this limited to followers? As long as checkQuorum is enabled, any node in StateLeader is guaranteed to have been in contact with a majority of nodes within an election timeout, so it shouldn't increase its term either.

If the entire cluster is restarted or is bootstrapped for the first time, there will be a few seconds unavailability.

This is already true: when the entire cluster is restarted, the first election cannot happen until an election timeout (plus randomization) has passed.

This is already true:

Right... Ignore my last point.

xiang90 · 2016-05-25T16:46:58Z

According to the thesis 6.4, there's a way to serve read-only query efficiently without going through raft logs. but it still needs to communicate with the quorum which is not efficient enough.

The leader lease approach @swingbach mentioned is actually explained in the 6.4.1 section. As the thesis correctly states, it might suffer from clock drift.

@ongardie Actually there is one thing that I am not clear about: in the thesis, it says

Once the leader’s heartbeats were acknowledged by a majority of
the cluster, the leader would assume that no other server will become leader for about an election
timeout,

Does this assume that pre-vote is implemented, thus the follower would not increase its term when it timeouts and would reject vote request if it recently receives from the leader ?

It seems like without pre-vote or other similar mechanism, we cannot assume there is no server will become leader for about an election timeout.

/cc @bdarnell

bdarnell · 2016-05-25T23:31:38Z

You don't need the full pre-vote RPC, but you do need this tweak from the end of section 4.2.3, which is basically what this commit does:

We modify the RequestVote RPC to achieve this: if a server receives a RequestVote request within the minimum election timeout of hearing from a current leader, it does not update its term or grant its vote.

This is also discussed in https://groups.google.com/d/msg/raft-dev/5Lxxr2UwzQc/UCKzsx0qL1sJ

xiang90 · 2016-05-25T23:39:22Z

@bdarnell I understand that part. However my concern was that

The peer with a higher term will keep on sending vote request. 
Leader cannot make that peer stable since that peer will always 
reject the append requests from leader since it has higher term.

bdarnell · 2016-05-25T23:45:02Z

Ah, OK. I was responding to your last comment, "It seems like without pre-vote or other similar mechanism, we cannot assume there is no server will become leader for about an election timeout." I think we can make that assumption, but if we make this change without pre-vote then a node may get stuck with a term number that is too high and be unable to proceed.

xiang90 · 2016-05-25T23:47:36Z

a node may get stuck with a term number that is too high and be unable to proceed.

Yea... So I think we need to implement pre-vote to make all stuff work correctly? A stuck node seems to be unacceptable.

xiang90 · 2016-05-25T23:56:11Z

@bdarnell Seem like here is the answer from deigo

The leader (A or B) will send a heartbeat to C (the one with higher term), C will reply to that heartbeat with its newer term, and the leader will step down and adopt that newer term.

Now we ignores message with lower term. We need to tweak that to make C be able to reply a response with higher term to disrupt the leader and free itself.

xiang90 · 2016-05-27T14:25:10Z

closing this in favor of #5468.

raft: making check quorum behaves more like a leader lease

949f6ff

xiang90 reviewed May 25, 2016
View reviewed changes

Merge remote-tracking branch 'upstream/master'

373de62

swingbach mentioned this pull request May 27, 2016

implemented leader lease when quorum check is on. #5468

Merged

xiang90 closed this May 27, 2016

tbg mentioned this pull request Sep 14, 2016

storage: quiescence leads to more elections cockroachdb/cockroach#9372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: making check quorum behaves more like a leader lease #5451

raft: making check quorum behaves more like a leader lease #5451

swingbach commented May 25, 2016

xiang90 May 25, 2016

bdarnell May 25, 2016

xiang90 May 25, 2016

xiang90 commented May 25, 2016 •

edited

Loading

bdarnell commented May 25, 2016

xiang90 commented May 25, 2016 •

edited

Loading

bdarnell commented May 25, 2016

xiang90 commented May 25, 2016

xiang90 commented May 25, 2016

xiang90 commented May 27, 2016

raft: making check quorum behaves more like a leader lease #5451

raft: making check quorum behaves more like a leader lease #5451

Conversation

swingbach commented May 25, 2016

xiang90 May 25, 2016

Choose a reason for hiding this comment

bdarnell May 25, 2016

Choose a reason for hiding this comment

xiang90 May 25, 2016

Choose a reason for hiding this comment

xiang90 commented May 25, 2016 • edited Loading

bdarnell commented May 25, 2016

xiang90 commented May 25, 2016 • edited Loading

bdarnell commented May 25, 2016

xiang90 commented May 25, 2016

xiang90 commented May 25, 2016

xiang90 commented May 27, 2016

xiang90 commented May 25, 2016 •

edited

Loading

xiang90 commented May 25, 2016 •

edited

Loading