-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gossipsub] Add optional metrics #2235
Conversation
* Added message count stuff * Message count maps use MeshIndex instead of PeerId * Added validated message count * Count rejected messages too.. * simplified adding new counts significantly * some cleanup and stop resetting counts * Added more metrics and cleanup * added peer slot to logging on score * added more metrics for prune cases * added churn_topic * added debug logging for churn counts * immutable borrow * fixed log * improved logging even more.. * ensure message counts structure exists * Forgot to add prune case * cleaned up debugging with macro * fixed unsubscribed condition * renamed some things for clarity * Optimized Performance with Vec Instead of BTreeMap * Updated comment for clarity * removed redundant category and optimized * fixed small edge case * Unified slot-related HashMaps into 1 Structure * Convert Fields to Array for Passing to Functions * Added Method for Iterating over Metrics * small fix * ran cargo fmt * ran clippy * One last refactor using enums. Looking great now. * Fixed bug with assert and improved error message * fixed issue with release builds * Make minimal changes to master * Added tracking of assigning slot * added slot_metrics_topics() & extra info to logs * Made slot_metrics_topics() smarter * First draft * Update metric design * Add new metric and cleanup * Cleanup code & Remove Redundant map Lookups (#147) * Simplify & Remove Redundant Map Lookups * Cleanup code/Remove Redundant lookups in heartbeat * Commented and renamed things for clarity * More renaming / reorganizing for clarity * Fixed clippy's unnecessary clone warning * one last renaming.. Co-authored-by: Mark Mackey <ethereumdreamer@gmail.com> Co-authored-by: ethDreamer <37123614+ethDreamer@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Always in favor of more instrumentation. Sorry for the delay here.
Feature flag
I wonder whether the gossipsub-metrics
and the metrics
feature are needed. Off the top of my head, I would suggest putting all metrics related logic behind #[cfg(debug_assertions)]
. In debug mode, no one would complain about the overhead of metric tracking. In release mode metric tracking would not happen. Noisiness of the metrics output in debug mode can be controlled via log levels.
Comparison to libp2p-metrics
This doesn't fit into the libp2p metrics framework
Correct. If I understand this pull request correctly, your goal is to be able to do ad-hoc debugging and not continuous monitoring, right? For the latter, I would very much like for libp2p-metrics
to support gossipsub. Let me know in case you would like to collaborate on that.
As an example what is already possible today:
/* For debug purposes | ||
use env_logger::{Builder, Env}; | ||
// Add this line to relevant tests. | ||
Builder::from_env(Env::default().default_filter_or("info")).init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just add let _ = env_logger::try_init();
as the first line of each test, like we do in most other crates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i'll modify this in the next iteration, if we decide to go down the libp2p-metrics
route
This is actually for optional continuous monitoring. I have set up a bunch of prometheus and grafana metrics for it also. When I skimmed the After looking at the kademlia metrics you have, I think something like that would be good for gossipsub, however i'm less interested in active connections more how the mesh and scoring is behaving. Some of the metrics added in this PR could probably be generalised to Anyway, yeah, would be good to collaborate, depending on how much time you have, otherwise, if you give me a very rough overview, i can go figure it out and integrate into |
Great. I think you know best what information is relevant. It would need to be exposed via
I created a first draft at #2277. Do you want to build on top of that? Happy to help further. Also feel free to reach out with any Prometheus or Open Metrics questions. I have been part of the maintainer team of the Prometheus project in a past life. |
Looking into the metrics that @AgeManning added I think those that would make sense to expose via a behaviour event are few. Maybe a middle ground would be to port these new internal metrics to |
I've had a bit of a look further into the This PR introduces a new concept of The current design is to internally collect all this data and assemble it to some consumable form, which is currently exposed via There's two things to consider I think:
Curious about your thoughts on these. |
What is your preferred way of exposing these metrics? Do you call
👍 agreed. Also since Rust features are additive, thus running two gossipsub instances would either requiring metric collection in both or none.
Agreed that bubbling up all the information via events is not feasible. Some middleground as suggested by @divagant-martian sounds like an approach worth exploring.
I am in favor of this approach. let mut metric_registry = Registry::default();
let metrics = Metrics::new(&mut metric_registry);
let gossipsub = GossipSub::new(Some(registry::sub_registry_with_prefix("gossipsub")), [...]); |
This would require to make |
Does that make sense @divagant-martian? |
Yes, noted too late. Thanks 👍 |
Closing in favour of #2346 |
These adds a feature flag which enables a set of metrics for gossipsub which can be helpful in analysing the mesh behaviour and general health of the network.
This doesn't fit into the libp2p metrics framework (which relies on actual behaviour messages) instead we instrument various sections of the gossipsub code to report and update a general metrics struct.
The metrics keep track of various parameters, but importantly tracks events within the mesh, such as peer churn and causes for the churn. We also keep track of IWANT and duplicates being received which are important metrics for tuning gossipsub parameters for specific networks.
For those that don't compile with the new feature
gossipsub-metrics
Gossipsub should remain unchanged.