Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking peak/max value #358

Open
kornelski opened this issue Oct 27, 2020 · 6 comments
Open

Tracking peak/max value #358

kornelski opened this issue Oct 27, 2020 · 6 comments

Comments

@kornelski
Copy link
Contributor

kornelski commented Oct 27, 2020

I'd like to have a gauge that precisely tracks a peak value of another gauge (I have a gauge that goes up and down, and its temporary peak value is more interesting than current value at any time). I'm not sure what's the best way to implement it.

Currently gauges have only inc/get/set, so I have something like this:

val.inc();
if val.get() > peak.get() {
   peak.set(val);
}
// later
val.dec();

but the get+set doesn't seem elegant, and isn't atomic. Maybe you could expose fetch_max?

If inc() returned current value I could even do:

peak.max(val.inc());
@mxinden
Copy link
Contributor

mxinden commented Oct 27, 2020

I'd like to have a gauge that precisely tracks a peak value of another gauge (I have a gauge that goes up and down, and its temporary peak value is more interesting than current value at any time).

@kornelski could you expand on your use-case a bit more? Which Prometheus queries are you planning to run on this gauge?

In case you want to ensure not missing spikes across Prometheus scrapes, try modelling your use-case with two counters instead of one gauge. E.g. for a queue instead of one gauge tracking the size of the queue, have two counters, one incremented on enqueue one incremented on dequeue.

@kornelski
Copy link
Contributor Author

kornelski commented Oct 27, 2020

In my case it's number of concurrent server requests being processed. I increase a gauge when a request comes in, and decrease when it's done. The problem is that when the gauge is scraped, it's close to 0 most of the time, because requests are processed pretty quickly. But I have some very sudden traffic spikes, and I'd like to know how many requests hit my server exactly at the same time.

The solution with two counters is interesting, but I think they'd also be equal most of the time when they're scraped, so I need to add extra instrumentation that catches momentary peaks between scrapes.

@breezewish
Copy link
Member

breezewish commented Oct 27, 2020

How about simply increase a counter when a request comes in? Then you can know the concurrency of the requests by using irate

@kornelski
Copy link
Contributor Author

kornelski commented Oct 27, 2020

No, that gives rate at which they come, but it can't see how many of them are being actively processed in parallel.

In terms of queueing theory, I have a steady state where arrivals equal departures. I can easily measure rate of arrivals and departures, but I want to know queue length, and not typical/average/sampled length, but maximum queue length reached.

@mxinden
Copy link
Contributor

mxinden commented Nov 3, 2020

Thanks for the details @kornelski.

I am not directly opposed to exposing some of the atomic operations to the user. I would like to suggest another alternative to the two Gauges approach though:

Say GenericGauge::inc would return the previous value. Also assume that you have a Histogram that tracks the queue length on each new arrival. In that case you can do the following on each new item arrival:

queue_length_histogram.observe(num_concurrent_requests.inc() + 1);

Depending on you bucket distribution you can get a good approximation on the max queue length by subtracting the highest accumulating bucket count with the second highest accumulating bucket count.

Compared to the two gauge approach you (a) don't have a race condition and (b) not only get the maximum queue length between scrapes, but the approximated queue length distribution across the scrape interval e.g. via quantiles.

Let me know what you think.

@kornelski
Copy link
Contributor Author

kornelski commented Nov 3, 2020

I don't quite follow why subtract bucket counts. I think approximate maximum could be found by looking for a bucket that represents the highest value and has non-zero hit count.

So Histogram can work to get this information, but has a higher cost of tracking (due to counting all buckets, and counts within them), and only gives a quantized value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants