-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validator metric for missed blocks #3414
Comments
I think this should be a feature of the The core missed-block detection logic should probably run in
1. Detecting missed blocksFor (1), this is fairly straight-forward. The As an example of detecting missed slots/blocks, let us assume there is a skipped-slot (i.e. a missing block) at slot 17. Calling block_roots = [
0xabc, // Slot N+0: Skipped status unknown.
0xdef, // Slot N+1: Not skipped
0x012, // Slot N+2: Not skipped
0x012, // Slot N+3: Skipped
0x012, // Slot N+4: Skipped
0x345 // Slot N+5: Not Skipped
] 2. Discovering the proposer of a missed blockRegarding (2), once we know a slot was skipped, we must determine the proposer for that skipped slot. Knowing the proposer allows us to determine if the validator is a "monitored validator" and is worth creating an alert for. Let us assume from step (1) we learned that slot lighthouse/beacon_node/beacon_chain/src/beacon_chain.rs Lines 3877 to 3880 in dfcb336
There are two complications with using this cache. Firstly, the cache might not contain the value that we want. In this case, I think we simply log a warning and then abort the effort to determine the proposer. The cache is large enough that it shouldn't miss and I think it'll only miss when the chain is unhealthy and I don't like starting to perform expensive computations when the chain is unhealthy. The second complication is determining the 3. Reducing false-positivesLet's consider a simple and common fork on the beacon chain:
The following is true about this chain:
Our This problem is hard to solve. We could completely remove false-positives by only creating alerts for finalized slots. However, finalization takes at least two epochs, so now our alerting system has a two-epoch lag (~13 mins). We are in a trade-off space between a fast alerting system and an correct alerting system. I propose we find a happy-medium only alerting for missed blocks if they are some This means that we need forks to span 4 slots before we start to create false-positive logs. Such forks are uncommon and I believe that false-positives are acceptable in this case. The four slots will introduce a 48 second lag to our alerts, which I think is totally acceptable. 4. Debouncing alertsWhen we detecting missed blocks in Because we're checking a range of slots each time, it's likely that we'll discover the same missed block more than once. To avoiding creating a new alert/Prometheus metric each time, I suggest that we add a Whenever we add a cache like this we need to make sure we prune it, otherwise we'll create a memory leak. I propose that each time we run |
Hi! I am an EPF fellow, is there anyone working on this issue? :) |
Hey @v4lproik, there's no one working on it yet (that I know of) and I'd love to see you take it on! |
Perfect, I'll get on with it tomorrow then. Thanks! |
Hey @v4lproik, how you are travelling with this one? Let us know if you've decided to do something else and we'll start working on this internally |
Hey @paulhauner! Sorry, it's holiday time for me, I am coming back at the end of the week and will start working on it if that's okay! |
No problems, thanks for getting back to me. I hope you had a nice holiday! |
@paulhauner I sent you a DM on Discord last week to talk about the Ethereum Fellowship and the different LH projects I'd like to work on including this one. Hopefully you have some time to look into it. Thanks! |
@v4lproik Paul is on leave for the next 2 weeks. Please try our |
Thanks a lot Michael! I just sent you a message on Discord! |
I will send a PR for this issue this week. I made good progress. |
Yeah this would be great, thanks!
This sounds like a good idea to me if you've got time to implement it and it doesn't bloat the impl too much. Having the correct value as well as the best-effort one would be ideal. I think the metrics blow-up shouldn't be too much, particularly if we only enable the missed block metrics when the validator monitor is enabled. I don't even think we'd need to guard it behind a CLI flag. |
Hi team, I would love some pointers regarding the implementation of the proposer cache. I left some comments in the draft PR. |
Description
Hello!
Currently, validator monitoring feature allows monitoring plenty of useful things in beacon node about validators, gathering information from api/gossip/blockchains.
We also can monitor new blocks produced by our validators via
validator_monitor_beacon_block_total
metricand
validator_monitor_prev_epoch_beacon_blocks_total
, but we can't see missed blocks. I think it will be a very useful metric for everyone to monitor not only produced blocks but also missed blocks.The text was updated successfully, but these errors were encountered: