Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: add last acked heartbeat timestamp for each follower/learner #1176

Closed
drmingdrmer opened this issue Jul 18, 2024 · 2 comments
Closed

Comments

@drmingdrmer
Copy link
Member

I guess this won't work in the case where no changes will be made to the state machine after a follower node becomes offline as the applied log indexes will remain the same, thus the difference will not change.

You're correct. To better assess the connectivity status of a follower node, it would be beneficial to add a metric that tracks the timestamp of the last acknowledged heartbeat from each follower. This additional information would provide a more accurate and timely indication of the follower's connectivity status. Here's how we could refine this idea:

  1. Add a new field to the RaftMetrics struct specifically for follower heartbeat information:
pub struct RaftMetrics<C: RaftTypeConfig> {
    // ... existing fields ...
    pub follower_heartbeats: HashMap<C::NodeId, Instant>,
}
  1. Update this field whenever a follower acknowledges a heartbeat or successfully replicates an entry:
impl<C: RaftTypeConfig> RaftMetrics<C> {
    pub fn update_follower_heartbeat(&mut self, follower_id: C::NodeId) {
        self.follower_heartbeats.insert(follower_id, Instant::now());
    }
}
  1. In the leader's routine that checks follower health:
const HEARTBEAT_TIMEOUT: Duration = Duration::from_secs(/* define your timeout */);

for (follower_id, last_heartbeat) in &raft_metrics.follower_heartbeats {
    if last_heartbeat.elapsed() > HEARTBEAT_TIMEOUT {
        // Consider this follower as potentially disconnected
    }
}

Originally posted by @drmingdrmer in #1174 (reply in thread)

Copy link

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@SteveLauC
Copy link
Collaborator

This issue can be closed as it was completed in #1177.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants