Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly understanding CPU utilization metrics #787

Closed
higels opened this issue May 20, 2021 · 14 comments
Closed

Properly understanding CPU utilization metrics #787

higels opened this issue May 20, 2021 · 14 comments
Labels

Comments

@higels
Copy link
Contributor

higels commented May 20, 2021

We noticed recently that on a reasonably loaded system (AMD 7702P, 128 hyper threads, 3.35Ghz at low load, will come down to 2.6Ghz as things heat up), there was a large discrepancy in overall and per-hyperthread utilization when comparing taskmgr vs. our windows_exporter metrics + the stuff we pull from WMI independently.

Specifically, taskmgr reported that the average utilization of the system was between 65 and 70%, with many individual cores being completely utilized, with their hyper thread siblings being between 10 and 30% utilized.

Our data from windows_exporter showed that 1 - avg(rate(windows_cpu_time_total{mode='idle'}[2m])) was between 50 and 55%. Our data from the WMI gauge \\Processor(_Total)\\% Processor Time agreed with windows_exporter.

The data for individual hyperthreads showed two bands, one between 70 and 85% utilization and the other between 20 and 50%.

We updated our independently gathered metrics to use \\Processor Information(_Total)\\% Processor Utility and this seemed to line up with what taskmgr was providing for overall utilization. % Processor Time is seemingly very old and doesn’t handle variance in CPU frequency.

If we trust taskmgr, with windows_exporter / perflib, we have a 10-15% difference in overall system utilization at higher load as our CPU clocks down, and the per core utilization is being misallocated across each pair of hyperthreads.

This brings me to my questions:

  1. Has anyone else observed this? It seems to happen on all of the various CPU types we have.
  2. Even though it’s a gauge, would a PR which exposed ProcessorUtilityRate be accepted, since it seems to offer us something that IdleTimeSeconds does not?
  3. What does processor_performance actually represent? I had hoped that it would help me scale idle time to show some kind of “performance lost to frequency or IPC decrease”, but all I can say for sure is that its rate is close to 0 when a system is idle, and close to 80M $somethings per second on a very busy hyperthread. The maximum seems to be around this number regardless of base CPU frequency.
@carlpett
Copy link
Collaborator

There's been a number of reports of the data not lining up, but noone has done such a thorough investigation. Thanks for that!
So, first off, on Win2008R2+, we actually use \\Processor Information(*)\* counters, rather than \\Processor(*), but I recall the overall data being similar.
Interestingly, we seem to added ProcessorUtilityRate so we extract the data, but don't expose an actual metric:

ProcessorUtilityRate float64 `perflib:"% Processor Utility"`

It'd be interesting to see how that data looks, it might be a counter internally (it often is), so if you have time to have a look at that it'd be great!

Re processor_performance, I tried deciphering it a few years ago, but didn't really figure it out, sadly :(

@higels
Copy link
Contributor Author

higels commented May 21, 2021

I think we might have answered part of the processor_performance question....

If you do something like this:

windows_cpu_core_frequency_mhz{} 
* (rate(windows_cpu_processor_performance{}[2m]))
/ ON (core) (1 - sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m])))
/ (scalar(count(windows_cpu_core_frequency_mhz{}) / 2))

On an AMD system, you will get a nice graph showing the effective P-state frequency of each hyperthread; if you're familiar with turbostat on Linux, it looks a lot like the Bzy_MHz column. If you remove the cstate_seconds bit, you'll get the equivalent of the Avg_MHz. This lines up pretty well with what I got from AMD's profiling tools.

On Intel systems, you need to replace that 2 denominator with a 0.5 for it to make sense. Without knowing where the metric comes from, it's hard to speculate as to why this is the case. I'm still pretty sure that Windows doesn't provide any known interface into the APERF / MPERF MSRs, but I have no idea where else it could come from.

About the idle metrics...

I've found that:

avg (sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m]))) (total time spent in c-states)

and

avg(rate(windows_cpu_time_total{mode='idle'}[2m]))

differ by 5-15% depending on load.

I would trust the former to be more reliable, and is closer to the % processor utility number.

So, our graph of CPU utilisation is made up of the following (BYO labels & $__interval):

  • Kernel time that isn't DPC / softirq:

avg (rate(windows_cpu_time_total{mode='privileged'}[2m]) - ON (core) rate(windows_cpu_time_total{mode='dpc'}[2m]))

  • DPC / Interrupt / User:

avg by (mode) (rate(windows_cpu_time_total{mode!~'(idle|privileged)'}[2m]))

  • Idle:

avg (sum by (core) (rate(windows_cpu_cstate_seconds_total{}[2m])))

If you use the "old" Idle metric, it very neatly adds up to 100%, but I have little faith in it.

I'll investigate the utility metric next week if time permits.

Thanks to @tycho for his help in getting it this far.

@higels
Copy link
Contributor Author

higels commented Aug 12, 2022

Hi @carlpett,

Sorry for the huge delay in following up on this - it's been a busy year.

I revisited this issue recently because some BIOS tuning failed to apply on a subset of servers, and it would have been really useful to have an alert to flag that a server wasn't going into boost properly.

Here's what I've found:

As it stands, the windows_cpu_processor_performance is effectively meaningless. Here's some powershell to demonstrate why:

Get-Counter -Counter "\Processor Information(0,1)\% Processor Performance"

Timestamp                  CounterSamples
---------                  --------------
8/12/2022 11:06:13 AM      \\blahblah\processor information(0,1)\% processor performance :
                           205.295719844358

which makes sense since it's a 2.0GHz CPU which boosts to about 4.3GHz.

When we dig into the counter a bit more like this:

(Get-Counter -Counter "\Processor Information(0,1)\% Processor Performance").CounterSamples | select *

Path             : \\blabblah\processor information(0,1)\% processor performance
InstanceName     : 0,1
CookedValue      : 214.033829499323
RawValue         : 44662434713
SecondValue      : 218918308
MultipleCount    : 1
CounterType      : AverageCount64
Timestamp        : 8/12/2022 11:20:32 AM
Timestamp100NSec : 133047768329218584
Status           : 0
DefaultScale     : 0
TimeBase         : 10000000

The RawValue is what the exporter metric uses, but the AverageCount64 type is described in https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.performancecountertype?view=dotnet-plat-ext-6.0 as being:

An average counter that shows how many items are processed, on average, during an operation. Counters of this type display a ratio of the items processed to the number of operations completed. The ratio is calculated by comparing the number of items processed during the last interval to the number of operations completed during the last interval. Counters of this type include PhysicalDisk\ Avg. Disk Bytes/Transfer.

If you take RawValue / SecondValue, it (roughly...) returns the CookedValue, which would be very useful to have.

I dug around perflib_exporter/perflib/perflib.go and it looks like any of the AverageCount types have unused 8 bytes after them that seem to line up with the SecondValue field, which is expected to be a UInt64

@@ -450,6 +449,15 @@ func convertCounterValue(counterDef *perfCounterDefinition, buffer []byte, value
                value = int64(bo.Uint32(buffer[valueOffset:(valueOffset + 4)]))
        }

+       switch counterDef.CounterType {
+       case 1073874176:
+               secondsOffset := valueOffset + int64(counterDef.CounterSize)
+               seconds := int64(bo.Uint64(buffer[secondsOffset:(secondsOffset + 8)]))
+               if seconds != 0 {
+                       value = value / seconds
+               }
+       }
+

And we get something that is good enough, but still not exactly the same as the cooked value.

windows_cpu_processor_performance{core="0,0"} 201
windows_cpu_processor_performance{core="0,1"} 197
windows_cpu_processor_performance{core="0,10"} 196
windows_cpu_processor_performance{core="0,11"} 193
windows_cpu_processor_performance{core="0,12"} 195
windows_cpu_processor_performance{core="0,13"} 192
windows_cpu_processor_performance{core="0,14"} 193
windows_cpu_processor_performance{core="0,15"} 192
windows_cpu_processor_performance{core="0,2"} 201
windows_cpu_processor_performance{core="0,3"} 195
windows_cpu_processor_performance{core="0,4"} 193
windows_cpu_processor_performance{core="0,5"} 196
windows_cpu_processor_performance{core="0,6"} 196
windows_cpu_processor_performance{core="0,7"} 196
windows_cpu_processor_performance{core="0,8"} 195
windows_cpu_processor_performance{core="0,9"} 196

I'm not familiar enough with the perflib code (and my golang has atrophied a bit) to know if my approach is safe; perhaps @leoluk has some input on this.

This would also enable the addition of an accurate processor utility gauge.

@higels
Copy link
Contributor Author

higels commented Aug 16, 2022

Of course, I start testing this on some actually loaded production systems and the got complete nonsense back. I investigated some more, and I'm pretty sure that RawValue is an approximation of the APERF MSR and SecondValue is the MPERF MSR.

We know that real_freq = tsc_freq * (aperf_t1 - aperf_t0) / (mperf_t1 - mperf_t0). To test, in perflib.go, for any of the AverageCounter64 metrics, I created a second fake PerfCounter with "_SecondValue" appended to it and plumbed that through to a new metric windows_cpu_processor_mperf, and it works really well.

I'll let it bake for a few more days, but if this is useful for people other than me, it'll require changes in both this project and perflib-exporter.

@breed808
Copy link
Contributor

Thanks @higels, I appreciate the time you've spent looking into this one 👍

If you need any assistance making changes just let me know.

@higels
Copy link
Contributor Author

higels commented Aug 31, 2022

Hi @breed808 - I've made a rough version of the proposed changes here:

master...higels:windows_exporter:add_mperf_metric

and to perflib_exporter here:

leoluk/perflib_exporter@master...higels:perflib_exporter:add_secondvalue_plumbing

Basic summary is that we add a SecondValue member to perflib.PerfCounter, populate it where appropriate and then allow a secondvalue flag (or whatever it's called) to signal to the unmarshaller that we should use that instead of the RawValue. I think this facilitates fairly convenient reuse down the line. A colleague recommended I just follow the pattern that json uses with omitempty.

I haven't tested this much yet, but just wanted to make sure I was on the right track.

I still need to improve the metric descriptions, but that's the easy part.

I have a few more questions:

  • I assume we/you would need to update the project to use a later release of perflib_exporter, if / when my changes there were accepted - is that likely to happen?
  • I'm not really sure what to do with scaling of ProcessorMPerf - I don't like that it and ProcessorPerformance are tightly related metrics, but the former needs to be scaled up by 1e4 for it to be usable.
  • % Processor Utility is also a really interesting metric and gives an accurate CPU idle % that lines up with taskmgr - would you prefer that in a later merge request, or ok to add now?

@higels
Copy link
Contributor Author

higels commented Sep 2, 2022

I've been running an exporter based on my changes above for a day now on quite a large number of systems, and I have CPU metrics that very accurately match what taskmgr shows. I do a bit of creative promql to break out Privileged Utility Time into system, dpc and interrupt and everything adds up to 100%. It's a little concerning how different the results are.

windows_cpu_time_total based metrics on the left, my newer metrics on the right:

image

(busiest core and busiest core dpc are overlaid lines and not part of the stack)

@leoluk
Copy link

leoluk commented Sep 3, 2022

perflib_exporter changes LGTM - just make a PR and I'll merge them and make a release.

@breed808
Copy link
Contributor

breed808 commented Sep 4, 2022

Great work! Once the perflib_exporter change is merged, raise a PR for the windows_exporter changes and we can get them merged.

@andonovski
Copy link

Any news on this? Is this merged, PR created? Currently on version 0.20.0 there is same issue with measuring CPU usage.

@leoluk
Copy link

leoluk commented Oct 10, 2022

Here's a new perflib_exporter release: https://github.com/leoluk/perflib_exporter/releases/tag/v0.2.0

@breed808
Copy link
Contributor

Apologies for the delay, I must have missed the notification for this. Dependency has been updated in #1084

@higels
Copy link
Contributor Author

higels commented Oct 31, 2022

I will hopefully have time to get my changes rebased and submitted this week. Thanks all for your work on this!

higels added a commit to higels/windows_exporter that referenced this issue Nov 2, 2022
This change adds 4 new CPU related metrics:

 * process_mperf_total
 * processor_rtc_total
 * processor_utility_total
 * processor_privileged_utility_total

and renames the existing process_performance to
processor_performance_total, since it was previously misunderstood and
was unlikely to be have been useful without the above new metrics

The data sources for these are not particularly well understood, and the
examples show that in some cases, arbitrary scaling factors are required
to actually make them useful, but in my testing on hundreds of systems
with a broad range of CPUs and operating systems from 2012r2 through to
2019 has proved out that we can use them to accurately display actual
CPU frequencies and CPU utilisation as it is represented in taskmgr.

Things I don't particularly like and would like input on:

 * I would have preferred to do the scaling of processor_mperf_total in
the code, but there isn't an elegant way of doing this right now.
 * Maybe processor_mperf_total should be called
processor_mperformance_total.

See prometheus-community#787 for discussion.

Signed-off-by: Steffen Higel <higels@valvesoftware.com>
mansikulkarni96 pushed a commit to mansikulkarni96/prometheus-community-windows_exporter that referenced this issue Mar 9, 2023
This change adds 4 new CPU related metrics:

 * process_mperf_total
 * processor_rtc_total
 * processor_utility_total
 * processor_privileged_utility_total

and renames the existing process_performance to
processor_performance_total, since it was previously misunderstood and
was unlikely to be have been useful without the above new metrics

The data sources for these are not particularly well understood, and the
examples show that in some cases, arbitrary scaling factors are required
to actually make them useful, but in my testing on hundreds of systems
with a broad range of CPUs and operating systems from 2012r2 through to
2019 has proved out that we can use them to accurately display actual
CPU frequencies and CPU utilisation as it is represented in taskmgr.

Things I don't particularly like and would like input on:

 * I would have preferred to do the scaling of processor_mperf_total in
the code, but there isn't an elegant way of doing this right now.
 * Maybe processor_mperf_total should be called
processor_mperformance_total.

See prometheus-community#787 for discussion.

Signed-off-by: Steffen Higel <higels@valvesoftware.com>
Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Nov 25, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2023
anubhavg-icpl pushed a commit to anubhavg-icpl/windows_exporter that referenced this issue Sep 22, 2024
This change adds 4 new CPU related metrics:

 * process_mperf_total
 * processor_rtc_total
 * processor_utility_total
 * processor_privileged_utility_total

and renames the existing process_performance to
processor_performance_total, since it was previously misunderstood and
was unlikely to be have been useful without the above new metrics

The data sources for these are not particularly well understood, and the
examples show that in some cases, arbitrary scaling factors are required
to actually make them useful, but in my testing on hundreds of systems
with a broad range of CPUs and operating systems from 2012r2 through to
2019 has proved out that we can use them to accurately display actual
CPU frequencies and CPU utilisation as it is represented in taskmgr.

Things I don't particularly like and would like input on:

 * I would have preferred to do the scaling of processor_mperf_total in
the code, but there isn't an elegant way of doing this right now.
 * Maybe processor_mperf_total should be called
processor_mperformance_total.

See prometheus-community#787 for discussion.

Signed-off-by: Steffen Higel <higels@valvesoftware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants