Add openmetrics exemplar support #320

fredr · 2022-08-02T10:26:52Z

This is a work in progress implementation for #175

Opening this as a draft to have a discussion to see if this is a valid approach to go forward with, or if there is a different path that is cleaner/better.

I've tried to keep all changes to the prometheus exporter, as this is a prometheus-specific feature, but have had to add downcasting to the recorder, so that a handle to the specific recorder can be fetched. But maybe there is some more generic api that can be used. I'm not very familiar with other metrics systems, but maybe it is common to add additional data to observations?

There is an example of how to increase a counter and record a sample in a histogram with exemplars. This is extra problematic for histograms, as we need to know what bucket to assign the exemplar with.

I'm also thinking about what would be a good macro api for this. Since an exemplar is just an other set of labels, we can't just add it as an other parameter.

e.g. with the following its hard to know what labels should be metric labels and which should be exemplar labels

counter_with_exemplar!("my_counter", "some_label" => "value", "trace_id" => "123");

We could probably say that exemplar labels are always expressions, so something like would probably be possible (I haven't written any macros, but I'm guessing this is possilbe)

counter_with_exemplar!("my_counter", vec![Label::from_parts("trace_id", "123")], "some_label" => "value");
counter_with_exemplar!("my_counter_without_labels", vec![Label::from_parts("trace_id", "123")]);

It might be confusing still, as it is easy to mix the different labels up.

Also, where would these macros live? I don't think we can add them to metrics-macros (behind a feature), since that would be a cyclic dependency if it needed to depend on the PrometheusRecorder.

tobz · 2022-08-02T12:58:28Z

Is there anyway you can give me a high-level overview of exemplars and how a typical application, with one of the official Prometheus client SDKs, is using them?

I'm finding it hard to conceptualize, and based on the changes so far, I think we need to step back and talk about the feature first before trying to design the interface to it.

fredr · 2022-08-02T14:33:19Z

Is there anyway you can give me a high-level overview of exemplars and how a typical application, with one of the official Prometheus client SDKs, is using them?

Absolutely, I'll write some stuff that I know about it here, let me know if there is any other details that you would like to know.

I think we need to step back and talk about the feature first before trying to design the interface to it.

Yes, I find it easier to reason about something after writing some code, so I started with that, not intending this to be the solution, but just something to have a discussion around.

Use case

Exemplars is a prometheus invention that can be use to attach additional data to some samples for a metric. An exemplar consist of the recorded value, a timestamp when it happen, and additional labels, the timestamp is optional (havent tried what happens if it is not set, guessing that the position when rendering it might get the collection time and not exactly when it happen). This is then used in e.g grafana to show little dots in graphs when creating graphs over these metrics. When hovering these dots the additional labels are shown. The most common use for this is to add a trace id to the exemplar labels, so that grafana can add a link into the tracing system.

This is one of the main trace discoverability features when using tempo as a tracing backend.

An example screenshot of how it is displayed in grafana:

Exporting

For prometheus to be able to ingest exemplars, they need to be exported with the metric. Each label set for a metric can expose one exemplar. So in prometheus there will at most be one exemplar per metric + label set per collection. If multiple exemplars are written to the same metric + label set within the same collection, it will just overwrite the exemplar. (I would say that exemplars make most sense when using prometheus histograms where there can be one exemplar per bucket, since the bucket is also a label)

The format it is exposed in is defined in open metrics.

Other implementations

I've mostly looked at how this is implemented in the prometheus golang client. E.g how it is used on histograms. It is backed by a atomic value store that just gets overwritten. The value that is stored is found here

Other thoughts

I noticed now that exemplars should not be implemented for summaries and gauges, only for histograms and counters, and for counters the openmetrics documentation is a big vague. IMO it would be enough to implement only for histograms, but that is still the hardest one so we might as well implement it for both.

I'm not sure what happens if a metric doesn't get their exemplar updated and gets collected several times. I think prometheus handles that, but I need to do some more research.

tobz · 2022-08-11T14:35:23Z

Just a note: this is still on my backlog to review, things have just been a bit hectic for me lately. 😅

fredr · 2022-08-11T16:07:04Z

No worries and no stress, but thanks for letting me know!

I'm also on discord most of the time if you want to discuss something when you look at this

tobz · 2022-08-17T12:21:13Z

So, my main question after reading your explanation (thank you for that!): do the exemplars have to be logically related?

Which is to say... if I have two concurrent tasks/threads/whatever emitting metrics and both of them hit the random number generator lottery of "you should track these metrics as exemplars", do all the metrics they touch need to have the same examplar label (trace ID, etc) or could some of the metrics have trace ID 1, and some have trace ID 2, etc?

Like, in terms of what the exemplar values are when the scrape endpoint is observed after both of those tasks/threads/whatever have finished and emitted all their metrics.

fredr · 2022-08-22T07:13:31Z

They can be different, there is no connection between exemplars over different metrics.

tobz · 2022-08-22T14:39:30Z

They can be different, there is no connection between exemplars over different metrics.

Alright, that's good news. 👍🏻

Depending on the behavior necessary, it seems like it could be possible to get away with sampling a value at the point of actually rendering the metrics. That is, every time the metrics are rendered -- which is just when we get a scrape request, or our interval to hit the push gateway ticks -- we collect the outstanding histogram samples and pick one of them to be our new exemplar.

Avoiding new exporter-specific methods seems like the highest priority item in my mind. We should ideally be able to just collect exemplars with people using the metrics macros and the Prometheus exporter the same way they always have.

fredr · 2022-09-01T13:49:06Z

Sorry for being so slow to reply.

Yes, getting the exemplar on render would definitely be enough, and it sound like you are on to something smart, but I don't understand how exactly 😅. How would one add an exemplar, if there where no exporter specific method for it?

tobz · 2022-09-01T14:40:16Z

Right, so my thought is that the render logic would essentially be responsible for figuring out if it was time to sample a new exemplar for each unique histogram.

So we'd have the exemplar value itself, and probably a timestamp for "when was this exemplar observed?". In the render logic, where it checks to see if it needs to consume any more raw samples from the underlying histogram storage, we'd see how long ago we last captured an exemplar for the given histogram. If we've exceeded our timeout, then we take one of the samples we just consumed and make it our exemplar, and update our timestamp.

Explained more contextually:

We start with an exporter that has a default initial state: no metrics observed yet, etc. We'll refer to the time that these actions/operations occur with the t=.. notation, denoting the time in seconds.

At t=0, render is called, and we consume all histograms from the registry, and discover a new one, histogram_a.
histogram_a has 10 samples (10 is just a random number, doesn't matter if it's 1 or 10000) and since we have not seen this histogram before, our "it's time to capture an exemplar" logic kicks in, and we randomly select one of those 10 samples as the exemplar value.
At t=2, render is called again, and we get 5 more samples for histogram_a when we consume it from the registry. For the sake of explanation, let's pretend our exemplar selection logic doesn't select a new exemplar unless it's been over 5 seconds since the current one was selected. Since only two seconds have elapsed, we don't choose a new exemplar.
At t=6, render is called again, and we get 7 more samples for histogram_a when we consume it from the registry. Since we last selected an exemplar for this histogram at t=0`, our exemplar selection logic now kicks in, and we randomly select one of the 7 samples we just consumed and use that as the new exemplar for the histogram.

So, overall, I'm forcing some design decisions here for the sake of explaining my idea:

exemplars are randomly selected from the samples we consume for a histogram
exemplars are only refreshed if enough time has elapsed

fredr · 2022-09-01T15:18:46Z

But how do we register the exemplar labels with the value? they need to be added when recording the value, as they contain data that is not know by the metric itself, most commonly a trace id that refers to some external tracing system.

That trace id will be different for every value recorded (as long as there are not multiple metrics recorded during the same request, which I thought your initial question about)

tobz · 2022-09-01T16:38:52Z

Are the labels for exemplars only meant for exemplars period? Like would you not typically include, say, a trade ID label unless you wanted it to be an exemplar?

fredr · 2022-09-02T08:22:38Z

Yes exactly, its a separate label set that the regular label set for the metric itself. The common use case is to attach a trace id to a specific observation, and what I mean is a distributed trace id that comes from outside, like from jaeger, tempo, zipkin etc.

Usually when you build some kind of web server with incoming http requests, there will be some ingress gateway that initiates a trace, creates a trace id and attaches it to the http request as headers, and then your web server reads these and attaches a bunch of "spans" to it, and upload those to whatever tracing backend you have, a trace id can span over multiple web servers, if they do requests to each other within the same initial request.

The exemplar is then used to find a trace in the tracing backend from a graph built by metrics. Usually on latency histograms, so that when you see your latency graph and want to figure out why some requests take a long time, you click on the little exemplar dot representing a specific observed request that took a long time to process.

Using the golang client, the definition would look something like this, specifying the metrics labels and buckets

var (
	histogram = prometheus.NewHistogramVec(
		prometheus.HistogramOpts{
			Name:    "foo_latency",
			Buckets: []float64{0.01, 0.1, 1, 10},
		},
		[]string{"path"}
	)
)

Then when observing a value, it would be something like this, very simplified, but getting the trace id from the request, and adding it to the exemplar label-set.

histogram.With(prometheus.Labels{"path": "/foo"}).(prometheus.ExemplarObserver).ObserveWithExemplar(
	latency, prometheus.Labels{"traceID": req.Header.Get("x-trace-id")},
)

This is what the prometheus exporter will render, just to clarify what it looks like:

# TYPE foo_latency histogram
foo_latency_bucket{path="/bar", le="0.01"} 0
foo_latency_bucket{path="/bar", le="0.1"} 8
foo_latency_bucket{path="/bar", le="1"} 11 # {trace_id="xyz123"} 0.67 1520879912.512
foo_latency_bucket{path="/bar", le="10"} 17 # {trace_id="abc123"} 9.8 1520879607.789
foo_latency_bucket{path="/bar", le="+Inf"} 17

In this example there are two different requests that have been recorded with an exemplar, one in bucket le=1 and one in bucket le=10, with their exact value, when they where observed, and this extra set of labels that include their trace id. (there can maximum be one observed exemplar per metric + metric labelset + bucket, if two are observed for the same, the later one just replaces the first one)

Sorry if I'm poorly trying to explain things that you already know, but just trying to explain the flow of how these things are used in the kind of area that I work with.

fredr · 2022-11-16T09:12:17Z

Hey @tobz, I lost the momentum a bit here, but would appreciate some feedback. I'm happy to continue on this if I know what way to go.

tobz · 2022-12-08T15:16:57Z

Hey @fredr! I'm no stranger to losing momentum. :)

I've started getting more serious about planning out the remaining work to bring metrics to a 1.0 release, and the general theme of that work is simplifying the public API surface as much possible: avoiding niche methods/functions, allowing for more flexible inputs, to let metrics adapt in the future.

With that said, I think my biggest concern, sort of right from the beginning: it doesn't feel right for there to be macros/methods/etc that are specific to Prometheus. I'm certainly not against third-party exporters having their own specific macros or something of the sort... but it's not the design pattern I want to promote in metrics or any of the official metrics ecosystem crates.

What I would want to see is figuring out a way to make the determination of when exemplars should be tracked something that functions more like a scoped behavior i.e. a function that takes a closure, and changes a thread-local to influence the behavior of any code running the closure. Alternatively, and maybe I'm still not entirely understanding how exemplars are typically tracked/triggered, but some sort of approach that was deterministic i.e. for any metric that has a specific label key, which is configured as part of the exporter itself, sample updates to metrics with that label at a configurable rate to derive exemplars.

With an approach as described above, it maintains one of the original design goals of metrics: if you're using a library, in your application, that is instrumented with metrics, you get that metric data "for free". It could be useful to include the metrics emitted by dependencies as part of an exemplar, but you can't do that if they need to use a specific macro.

I'm happy to continue providing feedback on the design as long as it's along the lines as described above, because that represents both the most ergonomic path, and easiest to maintain path, in my eyes. If executing on what I've posted above feels like it would consume too much time, I totally understand.

fredr · 2022-12-20T12:56:17Z

I've started getting more serious about planning out the remaining work to bring metrics to a 1.0 release, and the general theme of that work is simplifying the public API surface as much possible: avoiding niche methods/functions, allowing for more flexible inputs, to let metrics adapt in the future.

That sounds great, do you know already if that will have any implications on how buckets are registered for prometheus? or any other upcoming prometheus related changes?

With an approach as described above, it maintains one of the original design goals of metrics: if you're using a library, in your application, that is instrumented with metrics, you get that metric data "for free". It could be useful to include the metrics emitted by dependencies as part of an exemplar, but you can't do that if they need to use a specific macro.

Alright, my thinking behide the current poc implementation and suggestions was to keep all things that are specific to prometheus in the prometheus-exporter, but you have a good point that you then wouldn't get those specific features out of the box when using libraries that exposes metrics via this lib.

What I would want to see is figuring out a way to make the determination of when exemplars should be tracked something that functions more like a scoped behavior i.e. a function that takes a closure, and changes a thread-local to influence the behavior of any code running the closure

I'm not sure how to implement that, so I'll have to do some digging, if there is any such implementations in this or other crates that you know about, I would appreciate any pointers.

I'm happy to continue providing feedback on the design as long as it's along the lines as described above, because that represents both the most ergonomic path, and easiest to maintain path, in my eyes. If executing on what I've posted above feels like it would consume too much time, I totally understand.

Thanks, I'll keep at it whenever I have time to spare, this would be a really useful feature for trace discoverability.

fredr · 2023-07-18T12:05:53Z

I will close this PR as I currently struggle to find time to put into this, and maybe someone else want to pick it up.

WIP exemplar support

05f32f3

fredr mentioned this pull request Oct 3, 2022

Embark's pull requests EmbarkStudios/rust-ecosystem#20

Open

26 tasks

fredr closed this Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add openmetrics exemplar support #320

Add openmetrics exemplar support #320

fredr commented Aug 2, 2022

tobz commented Aug 2, 2022

fredr commented Aug 2, 2022 •

edited

Loading

tobz commented Aug 11, 2022

fredr commented Aug 11, 2022

tobz commented Aug 17, 2022 •

edited

Loading

fredr commented Aug 22, 2022

tobz commented Aug 22, 2022

fredr commented Sep 1, 2022

tobz commented Sep 1, 2022

fredr commented Sep 1, 2022

tobz commented Sep 1, 2022

fredr commented Sep 2, 2022

fredr commented Nov 16, 2022

tobz commented Dec 8, 2022

fredr commented Dec 20, 2022

fredr commented Jul 18, 2023

Add openmetrics exemplar support #320

Add openmetrics exemplar support #320

Conversation

fredr commented Aug 2, 2022

tobz commented Aug 2, 2022

fredr commented Aug 2, 2022 • edited Loading

Use case

Exporting

Other implementations

Other thoughts

tobz commented Aug 11, 2022

fredr commented Aug 11, 2022

tobz commented Aug 17, 2022 • edited Loading

fredr commented Aug 22, 2022

tobz commented Aug 22, 2022

fredr commented Sep 1, 2022

tobz commented Sep 1, 2022

fredr commented Sep 1, 2022

tobz commented Sep 1, 2022

fredr commented Sep 2, 2022

fredr commented Nov 16, 2022

tobz commented Dec 8, 2022

fredr commented Dec 20, 2022

fredr commented Jul 18, 2023

fredr commented Aug 2, 2022 •

edited

Loading

tobz commented Aug 17, 2022 •

edited

Loading