Properly understanding CPU utilization metrics #787

higels · 2021-05-20T22:10:32Z

We noticed recently that on a reasonably loaded system (AMD 7702P, 128 hyper threads, 3.35Ghz at low load, will come down to 2.6Ghz as things heat up), there was a large discrepancy in overall and per-hyperthread utilization when comparing taskmgr vs. our windows_exporter metrics + the stuff we pull from WMI independently.

Specifically, taskmgr reported that the average utilization of the system was between 65 and 70%, with many individual cores being completely utilized, with their hyper thread siblings being between 10 and 30% utilized.

Our data from windows_exporter showed that 1 - avg(rate(windows_cpu_time_total{mode='idle'}[2m])) was between 50 and 55%. Our data from the WMI gauge \\Processor(_Total)\\% Processor Time agreed with windows_exporter.

The data for individual hyperthreads showed two bands, one between 70 and 85% utilization and the other between 20 and 50%.

We updated our independently gathered metrics to use \\Processor Information(_Total)\\% Processor Utility and this seemed to line up with what taskmgr was providing for overall utilization. % Processor Time is seemingly very old and doesn’t handle variance in CPU frequency.

If we trust taskmgr, with windows_exporter / perflib, we have a 10-15% difference in overall system utilization at higher load as our CPU clocks down, and the per core utilization is being misallocated across each pair of hyperthreads.

This brings me to my questions:

Has anyone else observed this? It seems to happen on all of the various CPU types we have.
Even though it’s a gauge, would a PR which exposed ProcessorUtilityRate be accepted, since it seems to offer us something that IdleTimeSeconds does not?
What does processor_performance actually represent? I had hoped that it would help me scale idle time to show some kind of “performance lost to frequency or IPC decrease”, but all I can say for sure is that its rate is close to 0 when a system is idle, and close to 80M $somethings per second on a very busy hyperthread. The maximum seems to be around this number regardless of base CPU frequency.

The text was updated successfully, but these errors were encountered:

carlpett · 2021-05-21T08:00:59Z

There's been a number of reports of the data not lining up, but noone has done such a thorough investigation. Thanks for that!
So, first off, on Win2008R2+, we actually use \\Processor Information(*)\* counters, rather than \\Processor(*), but I recall the overall data being similar.
Interestingly, we seem to added ProcessorUtilityRate so we extract the data, but don't expose an actual metric:

windows_exporter/collector/cpu.go

Line 261 in 74eac8f

ProcessorUtilityRate float64 `perflib:"% Processor Utility"`

It'd be interesting to see how that data looks, it might be a counter internally (it often is), so if you have time to have a look at that it'd be great!

Re processor_performance, I tried deciphering it a few years ago, but didn't really figure it out, sadly :(

higels · 2021-05-21T22:31:43Z

I think we might have answered part of the processor_performance question....

If you do something like this:

windows_cpu_core_frequency_mhz{} 
* (rate(windows_cpu_processor_performance{}[2m]))
/ ON (core) (1 - sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m])))
/ (scalar(count(windows_cpu_core_frequency_mhz{}) / 2))

On an AMD system, you will get a nice graph showing the effective P-state frequency of each hyperthread; if you're familiar with turbostat on Linux, it looks a lot like the Bzy_MHz column. If you remove the cstate_seconds bit, you'll get the equivalent of the Avg_MHz. This lines up pretty well with what I got from AMD's profiling tools.

On Intel systems, you need to replace that 2 denominator with a 0.5 for it to make sense. Without knowing where the metric comes from, it's hard to speculate as to why this is the case. I'm still pretty sure that Windows doesn't provide any known interface into the APERF / MPERF MSRs, but I have no idea where else it could come from.

About the idle metrics...

I've found that:

avg (sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m]))) (total time spent in c-states)

and

avg(rate(windows_cpu_time_total{mode='idle'}[2m]))

differ by 5-15% depending on load.

I would trust the former to be more reliable, and is closer to the % processor utility number.

So, our graph of CPU utilisation is made up of the following (BYO labels & $__interval):

Kernel time that isn't DPC / softirq:

avg (rate(windows_cpu_time_total{mode='privileged'}[2m]) - ON (core) rate(windows_cpu_time_total{mode='dpc'}[2m]))

DPC / Interrupt / User:

avg by (mode) (rate(windows_cpu_time_total{mode!~'(idle|privileged)'}[2m]))

Idle:

avg (sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m])))

If you use the "old" Idle metric, it very neatly adds up to 100%, but I have little faith in it.

I'll investigate the utility metric next week if time permits.

Thanks to @tycho for his help in getting it this far.

higels · 2022-08-12T22:02:06Z

Hi @carlpett,

Sorry for the huge delay in following up on this - it's been a busy year.

I revisited this issue recently because some BIOS tuning failed to apply on a subset of servers, and it would have been really useful to have an alert to flag that a server wasn't going into boost properly.

Here's what I've found:

As it stands, the windows_cpu_processor_performance is effectively meaningless. Here's some powershell to demonstrate why:

Get-Counter -Counter "\Processor Information(0,1)\% Processor Performance"

Timestamp                  CounterSamples
---------                  --------------
8/12/2022 11:06:13 AM      \\blahblah\processor information(0,1)\% processor performance :
                           205.295719844358

which makes sense since it's a 2.0GHz CPU which boosts to about 4.3GHz.

When we dig into the counter a bit more like this:

(Get-Counter -Counter "\Processor Information(0,1)\% Processor Performance").CounterSamples | select *

Path             : \\blabblah\processor information(0,1)\% processor performance
InstanceName     : 0,1
CookedValue      : 214.033829499323
RawValue         : 44662434713
SecondValue      : 218918308
MultipleCount    : 1
CounterType      : AverageCount64
Timestamp        : 8/12/2022 11:20:32 AM
Timestamp100NSec : 133047768329218584
Status           : 0
DefaultScale     : 0
TimeBase         : 10000000

The RawValue is what the exporter metric uses, but the AverageCount64 type is described in https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.performancecountertype?view=dotnet-plat-ext-6.0 as being:

An average counter that shows how many items are processed, on average, during an operation. Counters of this type display a ratio of the items processed to the number of operations completed. The ratio is calculated by comparing the number of items processed during the last interval to the number of operations completed during the last interval. Counters of this type include PhysicalDisk\ Avg. Disk Bytes/Transfer.

If you take RawValue / SecondValue, it (roughly...) returns the CookedValue, which would be very useful to have.

I dug around perflib_exporter/perflib/perflib.go and it looks like any of the AverageCount types have unused 8 bytes after them that seem to line up with the SecondValue field, which is expected to be a UInt64

@@ -450,6 +449,15 @@ func convertCounterValue(counterDef *perfCounterDefinition, buffer []byte, value
                value = int64(bo.Uint32(buffer[valueOffset:(valueOffset + 4)]))
        }

+       switch counterDef.CounterType {
+       case 1073874176:
+               secondsOffset := valueOffset + int64(counterDef.CounterSize)
+               seconds := int64(bo.Uint64(buffer[secondsOffset:(secondsOffset + 8)]))
+               if seconds != 0 {
+                       value = value / seconds
+               }
+       }
+

And we get something that is good enough, but still not exactly the same as the cooked value.

windows_cpu_processor_performance{core="0,0"} 201
windows_cpu_processor_performance{core="0,1"} 197
windows_cpu_processor_performance{core="0,10"} 196
windows_cpu_processor_performance{core="0,11"} 193
windows_cpu_processor_performance{core="0,12"} 195
windows_cpu_processor_performance{core="0,13"} 192
windows_cpu_processor_performance{core="0,14"} 193
windows_cpu_processor_performance{core="0,15"} 192
windows_cpu_processor_performance{core="0,2"} 201
windows_cpu_processor_performance{core="0,3"} 195
windows_cpu_processor_performance{core="0,4"} 193
windows_cpu_processor_performance{core="0,5"} 196
windows_cpu_processor_performance{core="0,6"} 196
windows_cpu_processor_performance{core="0,7"} 196
windows_cpu_processor_performance{core="0,8"} 195
windows_cpu_processor_performance{core="0,9"} 196

I'm not familiar enough with the perflib code (and my golang has atrophied a bit) to know if my approach is safe; perhaps @leoluk has some input on this.

This would also enable the addition of an accurate processor utility gauge.

higels · 2022-08-16T02:10:45Z

Of course, I start testing this on some actually loaded production systems and the got complete nonsense back. I investigated some more, and I'm pretty sure that RawValue is an approximation of the APERF MSR and SecondValue is the MPERF MSR.

We know that real_freq = tsc_freq * (aperf_t1 - aperf_t0) / (mperf_t1 - mperf_t0). To test, in perflib.go, for any of the AverageCounter64 metrics, I created a second fake PerfCounter with "_SecondValue" appended to it and plumbed that through to a new metric windows_cpu_processor_mperf, and it works really well.

I'll let it bake for a few more days, but if this is useful for people other than me, it'll require changes in both this project and perflib-exporter.

breed808 · 2022-08-20T11:27:35Z

Thanks @higels, I appreciate the time you've spent looking into this one 👍

If you need any assistance making changes just let me know.

higels · 2022-08-31T23:49:12Z

Hi @breed808 - I've made a rough version of the proposed changes here:

master...higels:windows_exporter:add_mperf_metric

and to perflib_exporter here:

leoluk/perflib_exporter@master...higels:perflib_exporter:add_secondvalue_plumbing

Basic summary is that we add a SecondValue member to perflib.PerfCounter, populate it where appropriate and then allow a secondvalue flag (or whatever it's called) to signal to the unmarshaller that we should use that instead of the RawValue. I think this facilitates fairly convenient reuse down the line. A colleague recommended I just follow the pattern that json uses with omitempty.

I haven't tested this much yet, but just wanted to make sure I was on the right track.

I still need to improve the metric descriptions, but that's the easy part.

I have a few more questions:

I assume we/you would need to update the project to use a later release of perflib_exporter, if / when my changes there were accepted - is that likely to happen?
I'm not really sure what to do with scaling of ProcessorMPerf - I don't like that it and ProcessorPerformance are tightly related metrics, but the former needs to be scaled up by 1e4 for it to be usable.
% Processor Utility is also a really interesting metric and gives an accurate CPU idle % that lines up with taskmgr - would you prefer that in a later merge request, or ok to add now?

higels · 2022-09-02T22:55:57Z

I've been running an exporter based on my changes above for a day now on quite a large number of systems, and I have CPU metrics that very accurately match what taskmgr shows. I do a bit of creative promql to break out Privileged Utility Time into system, dpc and interrupt and everything adds up to 100%. It's a little concerning how different the results are.

windows_cpu_time_total based metrics on the left, my newer metrics on the right:

(busiest core and busiest core dpc are overlaid lines and not part of the stack)

leoluk · 2022-09-03T10:32:47Z

perflib_exporter changes LGTM - just make a PR and I'll merge them and make a release.

breed808 · 2022-09-04T00:35:58Z

Great work! Once the perflib_exporter change is merged, raise a PR for the windows_exporter changes and we can get them merged.

andonovski · 2022-10-08T09:15:06Z

Any news on this? Is this merged, PR created? Currently on version 0.20.0 there is same issue with measuring CPU usage.

leoluk · 2022-10-10T11:02:13Z

Here's a new perflib_exporter release: https://github.com/leoluk/perflib_exporter/releases/tag/v0.2.0

breed808 · 2022-10-25T06:02:33Z

Apologies for the delay, I must have missed the notification for this. Dependency has been updated in #1084

higels · 2022-10-31T17:41:26Z

I will hopefully have time to get my changes rebased and submitted this week. Thanks all for your work on this!

This change adds 4 new CPU related metrics: * process_mperf_total * processor_rtc_total * processor_utility_total * processor_privileged_utility_total and renames the existing process_performance to processor_performance_total, since it was previously misunderstood and was unlikely to be have been useful without the above new metrics The data sources for these are not particularly well understood, and the examples show that in some cases, arbitrary scaling factors are required to actually make them useful, but in my testing on hundreds of systems with a broad range of CPUs and operating systems from 2012r2 through to 2019 has proved out that we can use them to accurately display actual CPU frequencies and CPU utilisation as it is represented in taskmgr. Things I don't particularly like and would like input on: * I would have preferred to do the scaling of processor_mperf_total in the code, but there isn't an elegant way of doing this right now. * Maybe processor_mperf_total should be called processor_mperformance_total. See prometheus-community#787 for discussion. Signed-off-by: Steffen Higel <higels@valvesoftware.com>

github-actions · 2023-11-25T02:09:09Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

This change adds 4 new CPU related metrics: * process_mperf_total * processor_rtc_total * processor_utility_total * processor_privileged_utility_total and renames the existing process_performance to processor_performance_total, since it was previously misunderstood and was unlikely to be have been useful without the above new metrics The data sources for these are not particularly well understood, and the examples show that in some cases, arbitrary scaling factors are required to actually make them useful, but in my testing on hundreds of systems with a broad range of CPUs and operating systems from 2012r2 through to 2019 has proved out that we can use them to accurately display actual CPU frequencies and CPU utilisation as it is represented in taskmgr. Things I don't particularly like and would like input on: * I would have preferred to do the scaling of processor_mperf_total in the code, but there isn't an elegant way of doing this right now. * Maybe processor_mperf_total should be called processor_mperformance_total. See prometheus-community#787 for discussion. Signed-off-by: Steffen Higel <higels@valvesoftware.com>

carlpett added the collector/cpu label May 21, 2021

breed808 mentioned this issue Jun 26, 2021

Inaccurate windows_cpu_time_total value #810

Closed

Adel-CHT mentioned this issue Oct 12, 2021

CPU metric doesn't works on Physical machine (physical server) #850

Closed

higels mentioned this issue Sep 5, 2022

add plumbing for secondvalue leoluk/perflib_exporter#39

Merged

higels mentioned this issue Nov 2, 2022

Add cpu metrics based on newer and more accurate perflib sources #1088

Merged

github-actions bot added the Stale label Nov 25, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2023

higels mentioned this issue Sep 16, 2024

Possible aberation with the new CPU metrics on rtc counter reset (usage % above 100%) #1299

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly understanding CPU utilization metrics #787

Properly understanding CPU utilization metrics #787

higels commented May 20, 2021

carlpett commented May 21, 2021

higels commented May 21, 2021

higels commented Aug 12, 2022

higels commented Aug 16, 2022

breed808 commented Aug 20, 2022

higels commented Aug 31, 2022

higels commented Sep 2, 2022

leoluk commented Sep 3, 2022

breed808 commented Sep 4, 2022

andonovski commented Oct 8, 2022

leoluk commented Oct 10, 2022

breed808 commented Oct 25, 2022

higels commented Oct 31, 2022

github-actions bot commented Nov 25, 2023

Properly understanding CPU utilization metrics #787

Properly understanding CPU utilization metrics #787

Comments

higels commented May 20, 2021

carlpett commented May 21, 2021

higels commented May 21, 2021

higels commented Aug 12, 2022

higels commented Aug 16, 2022

breed808 commented Aug 20, 2022

higels commented Aug 31, 2022

higels commented Sep 2, 2022

leoluk commented Sep 3, 2022

breed808 commented Sep 4, 2022

andonovski commented Oct 8, 2022

leoluk commented Oct 10, 2022

breed808 commented Oct 25, 2022

higels commented Oct 31, 2022

github-actions bot commented Nov 25, 2023