Re call add_learner for a restarted node don't sync the state (snapshot). #927

TheWaWaR · 2023-11-11T14:27:21Z

openraft version: 0.8.3

I have a 3 nodes memory storage cluster setup by following steps:

start node 1, 2, 3
init node 1
add_learner(1)
add_learner(2)
add_learner(3)
change_membership([1, 2, 3])
write(1, [(key1, value1), (key2, value2), ...])

Then restart node-3 and call add_learner(3) for it, I can see node-3 got logs from node-1 but the the apply_to_state_machine() function not called, so the state is not synced.

So, my question is what's the proper way to restart a node and sync state from the leader?

NOTE: If I call change_membership(RemoveNodes([3])) and re-add it, it will works just as expected.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-11T14:27:32Z

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

drmingdrmer · 2023-11-13T00:19:22Z

You don't need to invoke add_learner() again when rebooting a node that has previously been part of a cluster.

Once a node has been incorporated into a cluster, no further action is required upon restart.

As for the problem regarding logs being replicated but not applied, could you supply the source code of the application and the debug-level logs for us to investigate further?

TheWaWaR · 2023-11-13T02:30:25Z

Just use raft-kv-memstore example will reproduce the issue.

export RUST_LOG=debug
nohup ./target/debug/raft-key-value  --id 1 --http-addr 127.0.0.1:21001 > n1.log &
nohup ./target/debug/raft-key-value  --id 2 --http-addr 127.0.0.1:21002 > n2.log &
nohup ./target/debug/raft-key-value  --id 3 --http-addr 127.0.0.1:21003 > n3.log &

sleep 2

curl "127.0.0.1:21001/init" -H "Content-Type: application/json" -d "{}"
sleep 1
curl "127.0.0.1:21001/add-learner" -H "Content-Type: application/json" -d '[2, "127.0.0.1:21002"]'
sleep 1
curl "127.0.0.1:21001/add-learner" -H "Content-Type: application/json" -d '[3, "127.0.0.1:21003"]'
sleep 1
curl "127.0.0.1:21001/change-membership" -H "Content-Type: application/json" -d '[1, 2, 3]'
sleep 1

curl "127.0.0.1:21001/write" -H "Content-Type: application/json" -d '{"Set":{"key":"foo","value":"bar"}}'
sleep 1
curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'

curl "127.0.0.1:21001/write" -H "Content-Type: application/json" -d '{"Set":{"key":"foo","value":"zoo"}}'
sleep 1
curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'

Then kill 127.0.0.1:21003

Then restart it wait some seconds and read "foo" from it:

nohup ./target/debug/raft-key-value  --id 3 --http-addr 127.0.0.1:21003 > n3.log &

sleep 5

curl "127.0.0.1:21003/read" -H "Content-Type: application/json" -d '"foo"'
{"Ok":""}

As you can see read "foo" returns {"Ok":""} which means the state is not synced.

drmingdrmer · 2023-11-13T02:55:52Z

Memstore does not persist data to disk. All state on node-3 will be lost upon restart.

And raft does not guarantee the state will be correctly synchronized when there is data loss.

In your scenario, the Leader still believes all raft-logs are replicated to node-3, and won't replicate any other data.

By removing node-3 and then re-add it. The replication state is reset and the leader forgets the previous state about that "logs are already replication to node-3" and starts to re-send all logs.

There is a FAQ(not yet published) discussing about this issue(https://github.com/datafuselabs/openraft/blob/main/openraft/src/docs/faq/faq.md):

🤔 Can I wipe out the data of ONE node and wait for the leader to replicate all data to it again?

💡 Avoid doing this. Doing so will panic the leader. But it is permitted
if [loosen-follower-log-revert] feature flag is enabled.

In a raft cluster, although logs are replicated to multiple nodes,
wiping out a node and restarting it is still possible to cause data loss.
Assumes the leader is N1, followers are N2, N3, N4, N5:

A log(a) that is replicated by N1 to N2, N3 is considered committed.

At this point, if N3 is replaced with an empty node, and at once the leader N1 is crashed. Then N5 may elected as a new leader with granted vote by N3, N4;

Then the new leader N5 will not have log a.
Ni: Node i
Lj: Leader   at term j
Fj: Follower at term j

N1 | L1  a  crashed
N2 | F1  a
N3 | F1  a  erased          F2
N4 |                        F2
N5 |                 elect  L2
----------------------------+---------------> time
                            Data loss: N5 does not have log `a`
But for even number nodes cluster, Erasing exactly one node won't cause data loss.
Thus, in a special scenario like this, or for testing purpose, you can use
--feature loosen-follower-log-revert to permit erasing a node.

TheWaWaR · 2023-11-13T03:16:14Z

Even if node-3 have disk storage, what if when node-3 received the last-n logs from node-1 and node-3 starting to save the state to disk, at this point node-3 crashed and the last-n logs lost? Is node-1 still see the last-n logs replicated to node-3?

I think there maybe need a communication process, if node-3's last log index is small than node-1 should allow node-1 sync state to node-3.

If I understand correctly, remove node-3 and re-add node-3 is the only way to sync state from node-1?

drmingdrmer · 2023-11-13T04:43:56Z

Raft can not provide any guarantee when any data is lost. It's undefined behavior.

As I mentioned previously, --feature loosen-follower-log-revert enables the ability for a follower's losing logs.

TheWaWaR · 2023-11-13T12:19:09Z

I see. Thanks!

drmingdrmer self-assigned this Nov 13, 2023

TheWaWaR closed this as completed Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re call add_learner for a restarted node don't sync the state (snapshot). #927

Re call add_learner for a restarted node don't sync the state (snapshot). #927

TheWaWaR commented Nov 11, 2023 •

edited

Loading

github-actions bot commented Nov 11, 2023

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 •

edited

Loading

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 •

edited

Loading

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 •

edited

Loading

Re call add_learner for a restarted node don't sync the state (snapshot). #927

Re call add_learner for a restarted node don't sync the state (snapshot). #927

Comments

TheWaWaR commented Nov 11, 2023 • edited Loading

github-actions bot commented Nov 11, 2023

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 • edited Loading

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 • edited Loading

drmingdrmer commented Nov 13, 2023

TheWaWaR commented Nov 13, 2023 • edited Loading

TheWaWaR commented Nov 11, 2023 •

edited

Loading

TheWaWaR commented Nov 13, 2023 •

edited

Loading

TheWaWaR commented Nov 13, 2023 •

edited

Loading

TheWaWaR commented Nov 13, 2023 •

edited

Loading