Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SystemInfo doesn't report correct cache sizes and cpu frequency on M1 Macs #1310

Closed
Civil opened this issue Dec 26, 2021 · 8 comments · Fixed by #1414
Closed

[BUG] SystemInfo doesn't report correct cache sizes and cpu frequency on M1 Macs #1310

Civil opened this issue Dec 26, 2021 · 8 comments · Fixed by #1414

Comments

@Civil
Copy link

Civil commented Dec 26, 2021

Describe the bug
For ARM-based macs old sysctls no longer report correct values (or no longer exist). For example on M1 Max mac you'll get following output:

Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
2021-12-26T12:15:27+01:00
Running ./build/build/bin/ram_speed
Run on (10 X 24.1211 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x10)
  L1 Instruction 128 KiB (x10)
  L2 Unified 4096 KiB (x5)
Load Average: 2.19, 1.73, 2.04

System
Which OS, compiler, and compiler version are you using:

  • OS: macos 11+
  • Compiler and version: any

To reproduce
Steps to reproduce the behavior:

  1. sync to commit ...
  2. cmake/bazel...
  3. make ...
  4. See error

Expected behavior
So the problems here are:

  1. cpufrequency is no longer exported via sysctl. It can be obtained from powermetrics CLI but that requires sudo to run
  2. Also cache sizes are different for Effective and Performance cores, SystemInfo currently report data for effective cores. Those are now exported via hw.perflevel[0-9]+ sysctls. Correct representation should look like:
Run on (2 X 2064 MHz, 8 X 3228 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x2), 128 KiB (x8)
  L1 Instruction 128 KiB (x2), 192 KiB (x8)
  L2 Unified 4096 KiB (x1), 12 MiB (x2), 
Load Average: 2.19, 1.73, 2.04

There is also L3 cache but it's size doesn't seems to be exported via sysctls (M1 Max should have 48 MB of L3)

Additional context
Relevant sysctls:

hw.perflevel1.cpusperl2: 2
hw.perflevel1.l1dcachesize: 65536
hw.perflevel1.l1icachesize: 131072
hw.perflevel1.l2cachesize: 4194304
hw.perflevel1.logicalcpu: 2
hw.perflevel1.logicalcpu_max: 2
hw.perflevel1.physicalcpu: 2
hw.perflevel1.physicalcpu_max: 2
hw.perflevel0.cpusperl2: 4
hw.perflevel0.l1dcachesize: 131072
hw.perflevel0.l1icachesize: 196608
hw.perflevel0.l2cachesize: 12582912
hw.perflevel0.logicalcpu: 8
hw.perflevel0.logicalcpu_max: 8
hw.perflevel0.physicalcpu: 8
hw.perflevel0.physicalcpu_max: 8

For L2 cache formula is {hw.perflevel1.l2cachesize} X {hw.perflevel1.physicalcpu/hw.perflevel1.cpusperl2} and can be repeated for each perf level (perflevel0 currently corresponds to performance cores)

As about CPU Frequency, here's example output from powermetrics (irrelevent information about power consumption was redacted):

% sudo /usr/bin/powermetrics -s cpu_power -n 1
Password:
Machine model: MacBookPro18,2
OS version: 21C52
Boot arguments:
Boot time: Tue Dec 21 21:53:27 2021



* Sampled system activity (Sun Dec 26 12:25:57 2021 +0100) (5004.79ms elapsed) *


** Processor usage **

E-Cluster Power: 39 mW
E-Cluster HW active frequency: 1098 MHz
E-Cluster HW active residency:  41.55% (600 MHz:   0% 972 MHz:  83% 1332 MHz: 5.6% 1704 MHz: 4.7% 2064 MHz: 6.5%)
E-Cluster idle residency:  58.45%
E-Cluster instructions retired: 2.72095e+09
E-Cluster instructions per clock: 0.851289
CPU 0 frequency: 1138 MHz
CPU 0 idle residency:  69.54%
CPU 0 active residency:  30.46% (600 MHz:   0% 972 MHz:  23% 1332 MHz: 2.5% 1704 MHz: 2.5% 2064 MHz: 2.1%)
CPU 1 frequency: 1132 MHz
CPU 1 idle residency:  73.38%
CPU 1 active residency:  26.62% (600 MHz:   0% 972 MHz:  21% 1332 MHz: 1.8% 1704 MHz: 2.1% 2064 MHz: 1.9%)

P0-Cluster Power: 82 mW
P0-Cluster HW active frequency: 1398 MHz
P0-Cluster HW active residency:   6.32% (600 MHz:  59% 828 MHz: .55% 1056 MHz: 5.7% 1296 MHz: 3.2% 1524 MHz: .51% 1752 MHz: 2.1% 1980 MHz: .01% 2208 MHz: .16% 2448 MHz: 1.8% 2676 MHz: 1.9% 2904 MHz: .26% 3036 MHz: .70% 3132 MHz: .38% 3168 MHz: .29% 3228 MHz:  23%)
P0-Cluster idle residency:  93.68%
P0-Cluster instructions retired: 1.32581e+09
P0-Cluster instructions per clock: 1.74094
CPU 2 frequency: 2157 MHz
<redacted>

Here the max available frequency can be get from E-Cluster HW active residency and P0-Cluster HW active residency, same information is also provided for P1 cluster.

Actually similar logic should be implemented on Linux as it is possible to run benchmarks on Linux@ARM where CPU Cores are not equal. But I think that worth opening another bug.

I can try to implement a fix for MacOS, however I think it needs to be agreed on how to obtain the data. For example it might be too harsh to ask people to run benchmarks via sudo to get CPU Frequency, so in that case it might be worth to output a meaningful error message if that is the case. Also it might be worth splitting the logic for ARM and x86 based Macs and use hw.cpufrequency to determine what we are running on.

Output format is also important as I don't want to break things for people, but still think that it's worth to output both CPU clusters correctly. So my example output above is my vision of how to represent that information.

@dmah42
Copy link
Member

dmah42 commented Dec 28, 2021

i agree it's too much to ask the benchmarks to be run with sudo for this.

it's worth thinking about how it would be output in the JSON formatter as this is the most flexible. i don't think anyone is parsing the console output (ob https://hyrumslaw.com).

@Civil
Copy link
Author

Civil commented Dec 28, 2021

Thanks. I'll think how it would look in json and based on that would adjust the text output. Just not to promise anything, but I'll try to come up with some PoC PR in upcoming weeks.

As about frequency - I'm thinking about actually trying to execute the powermetrics, but if it fails - then just use some default value/

And as about json - maybe for compatibility reasons I would keep current representation but I'd add extra filed "details" or something like that that would have proper structure.

@franziskuskiefer
Copy link

Are there any updates on this?
I also wonder how accurate the measurements on ARM-based Macs are right now.

@Civil
Copy link
Author

Civil commented Jun 14, 2022

I thought to work on this as I had some ideas on what can be done here, but I haven't done anything.

On ARM-based MACs current measures of cache and frequency are NOT accurate, however that's important only in some cases. It seems, that because OSX for compatibility reasons(?) reports static value that have nothing to do with real frequency and same goes for Cache as legacy sysctls (common for x86 and ARM based macos) on ARM report static value as well that have nothing to do with real cache sizes.

@franziskuskiefer
Copy link

On ARM-based MACs current measures are NOT accurate

Yeah that's what I thought. That's too bad.
With all the ARM-based Macs out there this makes these benchmarks way less useful unfortunately.

I don't have a good idea for how to solve this ☹️
There's some related discussion here giampaolo/psutil#1892 and someone apparently came up with something here https://github.com/BitesPotatoBacks/SFMRM

@dmah42
Copy link
Member

dmah42 commented Jun 15, 2022

that's not true.

the benchmark measurements don't rely on cycles per second, and they merely report it in the metadata for informational purposes (comparison across machines, for example).

the benchmarks themselves measure time based on hardware clocks or software clocks. in the case of OSX (arm or not) we use pthread's timing info for thread time and getrusage for process time.

@Civil
Copy link
Author

Civil commented Jun 15, 2022

@dominichamon I might wrote my thoughts wrong - as by "measures" I've meant whatever goes to the report in cache size, CPU Frequency, etc. Sorry for the confusion.

As far as I understand that would make it less useful if you want later on to get idea of performance per MHz for example, but that's another story.

@franziskuskiefer
Copy link

👍🏻 makes sense. Thanks for clarifying.

Maybe one could remove the warning and unreliable information until this issue is fixed and just report that these numbers can't be retrieved right now?

dmah42 pushed a commit that referenced this issue Jun 20, 2022
)

* Clarify that the cpu frequency is not used for benchmark timings.

Fixes #1310

* fix format (clang-format missed this...)

* oops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants