Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/metrics: Add protocols label to address-specific metrics #2982

Merged
merged 27 commits into from
Nov 15, 2022

Conversation

John-LittleBearLabs
Copy link
Contributor

@John-LittleBearLabs John-LittleBearLabs commented Oct 4, 2022

Description

Previously, we would only track the metrics like the number of open connections. With this patch, we extend these metrics with a protocols label that contains a "protocol stack". A protocol stack is a multi-address with all variable parts removed. For example, /ip4/127.0.0.1/tcp/1234 turns into /ip4/tcp.

Resolves #2758.

Links to any relevant issues

Open Questions

The use of Family appends _total to the metric name. Is this desirable?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

misc/metrics/Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few comments.

misc/metrics/Cargo.toml Outdated Show resolved Hide resolved
misc/metrics/examples/metrics/main.rs Outdated Show resolved Hide resolved
misc/metrics/src/identify.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/swarm.rs Outdated Show resolved Hide resolved
misc/metrics/src/swarm.rs Outdated Show resolved Hide resolved
@John-LittleBearLabs

This comment was marked as outdated.

@thomaseizinger
Copy link
Contributor

With the change in key the metrics look slightly different now

libp2p_swarm_connections_incoming_total{ip4/tcp} 1
libp2p_identify_listen_addresses_total{ip4/tcp} 5

Yep, that is almost what we want (see #2758)! :)

A leading / would be great and instead of creating a new label for each one, we should have one label address_stack with multiple values!

@John-LittleBearLabs

This comment was marked as outdated.

@thomaseizinger
Copy link
Contributor

instead of creating a new label for each one, we should have one label address_stack with multiple values!

Not sure I follow exactly. I'm guessing instead of

libp2p_identify_listen_addresses_total{/ip4/tcp} 5
libp2p_identify_listen_addresses_total{/ip4/udp} 1

We'd have something like

libp2p_identify_listen_addresses /ip4/tcp=5 /ip4/udp=1

? If so I'd have to look into how something like that would be done (glancing at prometheus-client does not make it obvious). Or do you mean just have one metric to count both identify listen addresses and also swarm connections incoming summed together?

It is actually quite easy. Take a look at the example at the root of the prometheus-client docs: https://docs.rs/prometheus-client/0.18.0/prometheus_client/

If you define a custom Labels type, the Encode trait will render them as key=value, e.g. in our case address_stack=/ip4/tcp/ etc.
If you then ingest these metrics into something like grafana, it will recognise the label key (address_stack) and allows you perform queries on it, visualize it etc. For example, you could see the individual value for each value of address_stack in a graph or group them by particular values.

Under the hood, prometheus will actually create a separate metric per unique label combination. Labels are just a kind of syntax-sugar on top to allow grouping / filter etc. Plus, (I think) you automatically get a _total metric that represents the sum of all of them.

@John-LittleBearLabs
Copy link
Contributor Author

If you define a custom Labels type, the Encode trait will render them as key=value, e.g. in our case address_stack=/ip4/tcp/ etc.

Not 100% sure this is what you meant, but here's where I"m at now (this output is from the metrics example).

$curl -s http://127.0.0.1:36827/metrics | grep -E 'identify_listen_addresses|swarm_connections_incoming'
# HELP libp2p_identify_listen_addresses Number of listen addresses for remote peer per protocol stack.
# TYPE libp2p_identify_listen_addresses counter
libp2p_identify_listen_addresses_total{address_stack="/ip4/tcp"} 5
# HELP libp2p_swarm_connections_incoming Number of incoming connections per address stack.
# TYPE libp2p_swarm_connections_incoming counter
libp2p_swarm_connections_incoming_total{address_stack="/ip4/tcp"} 1
# HELP libp2p_swarm_connections_incoming_error Number of incoming connection errors.
# TYPE libp2p_swarm_connections_incoming_error counter

lbl@chomp:~lbl/code/rust-multiaddr/tests
$curl -s http://127.0.0.1:35883/metrics | grep -E 'identify_listen_addresses|swarm_connections_incoming'
# HELP libp2p_identify_listen_addresses Number of listen addresses for remote peer per protocol stack.
# TYPE libp2p_identify_listen_addresses counter
libp2p_identify_listen_addresses_total{address_stack="/ip4/tcp"} 6
# HELP libp2p_swarm_connections_incoming Number of incoming connections per address stack.
# TYPE libp2p_swarm_connections_incoming counter
# HELP libp2p_swarm_connections_incoming_error Number of incoming connection errors.
# TYPE libp2p_swarm_connections_incoming_error counter

@thomaseizinger
Copy link
Contributor

If you define a custom Labels type, the Encode trait will render them as key=value, e.g. in our case address_stack=/ip4/tcp/ etc.

Not 100% sure this is what you meant, but here's where I"m at now (this output is from the metrics example).

$curl -s http://127.0.0.1:36827/metrics | grep -E 'identify_listen_addresses|swarm_connections_incoming'
# HELP libp2p_identify_listen_addresses Number of listen addresses for remote peer per protocol stack.
# TYPE libp2p_identify_listen_addresses counter
libp2p_identify_listen_addresses_total{address_stack="/ip4/tcp"} 5
# HELP libp2p_swarm_connections_incoming Number of incoming connections per address stack.
# TYPE libp2p_swarm_connections_incoming counter
libp2p_swarm_connections_incoming_total{address_stack="/ip4/tcp"} 1
# HELP libp2p_swarm_connections_incoming_error Number of incoming connection errors.
# TYPE libp2p_swarm_connections_incoming_error counter

lbl@chomp:~lbl/code/rust-multiaddr/tests
$curl -s http://127.0.0.1:35883/metrics | grep -E 'identify_listen_addresses|swarm_connections_incoming'
# HELP libp2p_identify_listen_addresses Number of listen addresses for remote peer per protocol stack.
# TYPE libp2p_identify_listen_addresses counter
libp2p_identify_listen_addresses_total{address_stack="/ip4/tcp"} 6
# HELP libp2p_swarm_connections_incoming Number of incoming connections per address stack.
# TYPE libp2p_swarm_connections_incoming counter
# HELP libp2p_swarm_connections_incoming_error Number of incoming connection errors.
# TYPE libp2p_swarm_connections_incoming_error counter

Yep that looks correct, thanks! 😊

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments but otherwise LGTM.

This is almost ready to merge, thank you! :)

misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger marked this pull request as ready for review October 14, 2022 09:02
@thomaseizinger thomaseizinger changed the title Address stack metric misc/metrics: Track "address stack" of listen and remote addresses Oct 14, 2022
@thomaseizinger thomaseizinger changed the title misc/metrics: Track "address stack" of listen and remote addresses misc/metrics: Track "address stack" of listen addresses and incoming connections Oct 14, 2022
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'll let @mxinden also have a look as he wrote the original issue!

@thomaseizinger
Copy link
Contributor

Actually, can you add a changelog entry please? :)

mxinden
mxinden previously requested changes Nov 2, 2022
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already proven useful, see #2289 (comment).

Couple of comments. How about expanding this beyond the two metrics you currently improve?

misc/metrics/examples/metrics/main.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
pub struct Label {
address_stack: String,
}
impl Label {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not implement From for Label?

impl From<&Multiaddr> for Label {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had this at some point and I recommended to move away from it. Unless we are actually using From as an abstraction somewhere, using a generic function has downsides in terms of clarity. It also encourages the use of .into at callsites which makes it hard to see, what this is actually being converted into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that sounds familiar. I probably resolved & hid the comment when I committed the recommended change, but it's probably still above somewhere.

Personally I think I prefer the From approach in this case - since the struct has such an incredibly narrow usage I don't think there's too much confusion about what into is doing. But of course I'll go with whatever the consensus is.

Copy link
Contributor

@thomaseizinger thomaseizinger Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From would be nice if we could change the signature of get_or_create to accept an impl Into<S> so you don't have to call .into.

Unfortunately we can't do that without requiring ownership on _every_callsite which would require allocations even though we only need ownership inside the get_or_create function for the first call. Meaning a lot of these allocations would be wasted.

I'd be in favor of not obfuscating what we are converting here:

  1. Using .into makes it hard to find all usages of Labels. Currently, I have to delete the From impl and see where the compiler fails.
  2. I find this:
                    self.listen_addresses
                        .get_or_create(&protocol_stack::Labels::new(&listen_addr))
                        .inc();

much clearer to read than this:

                    self.listen_addresses
                        .get_or_create(&listen_addr.into())
                        .inc();

The former I can glance over whereas with the latter, my brain stops and is like: "Hang on, into() what?". And I have to recall the function signature of get_or_create, check the type of listen_addr and search for From implementations to understand what is happening here.

  1. If we ever need more data for the labels, From would need to be using a tuple at which point we need a constructor anyway.
  2. Out of principle, I think we should not be depending on abstractions (From), if we don't use them. Traits are only useful in generic code, so unless we can make a function that uses T: Into where we can pass a &Multiaddr, I think this doesn't qualify.

misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/swarm.rs Show resolved Hide resolved
Comment on lines 16 to 22
//TODO: remove this trait and tag() and replace with calls to the upstream method
// once that lands : https://github.com/multiformats/rust-multiaddr/pull/60
// In the meantime there is no _ case in the match so one can easily detect mismatch in supported
// protocols when dependency version changes.
trait MultiaddrExt {
fn protocol_stack(&self) -> String;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to cut a new release of multiaddr. Sorry for the delay.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets merge it! Thanks for sticking to this and sorry for the back and forth in the review!

I think clippy still wants to have a word with you, otherwise, I am happy to merge this :)

misc/metrics/CHANGELOG.md Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
misc/metrics/src/protocol_stack.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

@mxinden Do you want to wait for multiformats/rust-multiaddr#62 and upgrade in this PR right away?

@mxinden
Copy link
Member

mxinden commented Nov 8, 2022

@mxinden Do you want to wait for multiformats/rust-multiaddr#62 and upgrade in this PR right away?

Yes. I don't think we are in a rush here and thus it's fine to wait.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-ups here @John-LittleBearLabs.

Once multiformats/rust-multiaddr#62 is merged, and released and used in this pull request, this is good to go from my end.

@mergify
Copy link
Contributor

mergify bot commented Nov 13, 2022

This pull request has merge conflicts. Could you please resolve them @John-LittleBearLabs? 🙏

@thomaseizinger thomaseizinger changed the title misc/metrics: Track "address stack" of listen addresses and incoming connections misc/metrics: Add protocols label to address-specific metrics Nov 15, 2022
The labels are specific to the metric they are used it whereas
the `protocol_stack` module is more generic and shouldn't have
a dependency on prometheus.
@thomaseizinger
Copy link
Contributor

I adjusted the PR title and description for a better commit message, plus did some minor tidy up in the code. Based on this comment, I am going ahead in merging this!

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Good to go from my side.

@mergify mergify bot merged commit 69efe63 into libp2p:master Nov 15, 2022
@mxinden
Copy link
Member

mxinden commented Nov 16, 2022

Merge branch 'master' into addrstak_metric

@thomaseizinger in which cases do we manually merge master and in which cases do we depend on mergify to merge master?

@mxinden
Copy link
Member

mxinden commented Nov 18, 2022

For those interested, this is now deployed on kademlia-exporter.max-inden.de and the graphs are updated accordingly.

https://kademlia-exporter.max-inden.de/d/Pfr0Fj6Mk/rust-libp2p?orgId=1&from=now-2d&to=now&var-data_source=Prometheus&var-instance=kademlia-exporter-ipfs:8080

@thomaseizinger
Copy link
Contributor

Merge branch 'master' into addrstak_metric

@thomaseizinger in which cases do we manually merge master and in which cases do we depend on mergify to merge master?

I am not sure I remember why I merged master in but mergify is only going to update the branch when the PR is already queued for merging.

It will only queue it once all checks are passing. One of those is the send-it label. All others are inferred from the branch protection settings. If you have a flaky CI check, mergify won't trigger another merge so maybe that is what happened here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

misc/metrics/: Track address stack
3 participants