Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pressure metric collection fails on systems that do not expose a full CPU stat #3051

Closed
pahaeanx opened this issue Jun 14, 2024 · 1 comment · Fixed by #3054
Closed

Pressure metric collection fails on systems that do not expose a full CPU stat #3051

pahaeanx opened this issue Jun 14, 2024 · 1 comment · Fixed by #3054
Labels
bug platform/Linux Linux specific issue

Comments

@pahaeanx
Copy link

Looks to me like #3016 unfortunately broke pressure stats collection on systems that do not expose a full stat for CPU.

In my case this happens on Debian 11. There /proc/pressure/cpu and /sys/fs/cgroup/cpu.pressure do not contain values for full and the collector aborts after failing to collect the pressure stats for CPU full (see log ouput further down).

# cat /sys/fs/cgroup/cpu.pressure /proc/pressure/cpu
some avg10=2.65 avg60=2.78 avg300=2.92 total=111749368752
some avg10=2.65 avg60=2.78 avg300=2.92 total=111749368752

Host operating system: output of uname -a

Debian 11

Linux <snip> 5.10.0-30-amd64 #1 SMP Debian 5.10.218-1 (2024-06-01) x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.1 (branch: HEAD, revision: 400c3979931613db930ea035f39ce7b377cdbb5b)
  build user:       root@7afbff271a3f
  build date:       20240521-18:36:22
  go version:       go1.22.3
  platform:         linux/amd64
  tags:             unknown

node_exporter command line flags

Pressure collector enabled.

node_exporter log output

Jun 14 06:51:23 <snip> node_exporter[2135275]: ts=2024-06-14T06:51:23.605Z caller=pressure_linux.go:92 level=debug collector=pressure msg="collecting statistics for resource" resource=cpu
Jun 14 06:51:23 <snip> node_exporter[2135275]: ts=2024-06-14T06:51:23.605Z caller=pressure_linux.go:110 level=debug collector=pressure msg="pressure information returned no 'full' data"
Jun 14 06:51:23 <snip> node_exporter[2135275]: ts=2024-06-14T06:51:23.605Z caller=collector.go:167 level=debug msg="collector returned no data" name=pressure duration_seconds=0.0001385 err="collector returned no data"

Are you running node_exporter in Docker?

No

What did you expect to see?

Same behavior we saw with node-exporter version <1.8.1 -- we still collected the rest of the pressure metrics.

Pressure stats collection should continue and simply skip the node_pressure_cpu_stalled_seconds_total metric (I assume that's what is output in case of full CPU stall)

What did you see instead?

No pressure metrics at all. The collector fails with the above error message.

node_scrape_collector_success{collector="pressure"} 0
@SuperQ
Copy link
Member

SuperQ commented Jun 14, 2024

Ahh you're right. From the documentation.

CPU full is undefined at the system level, but has been reported since 5.13, so it is set to zero for backward compatibility.

We should allow only some for CPU.

@SuperQ SuperQ added bug platform/Linux Linux specific issue labels Jun 14, 2024
chengjoey added a commit to chengjoey/node_exporter that referenced this issue Jun 19, 2024
…full CPU stat prometheus#3051

Signed-off-by: joey <zchengjoey@gmail.com>
SuperQ pushed a commit that referenced this issue Jun 19, 2024
…full CPU stat #3051 (#3054)

Signed-off-by: joey <zchengjoey@gmail.com>
SuperQ pushed a commit that referenced this issue Jun 19, 2024
…full CPU stat #3051 (#3054)

Signed-off-by: joey <zchengjoey@gmail.com>
SuperQ pushed a commit that referenced this issue Jun 19, 2024
…full CPU stat #3051 (#3054)

Signed-off-by: joey <zchengjoey@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
chengjoey added a commit to chengjoey/node_exporter that referenced this issue Jun 27, 2024
…unt can be deleted after it is successfully executed in goroutine. prometheus#3051

Signed-off-by: joey <zchengjoey@gmail.com>
chengjoey added a commit to chengjoey/node_exporter that referenced this issue Jun 27, 2024
…unt can be deleted after it is successfully executed in goroutine. prometheus#3051

Signed-off-by: joey <zchengjoey@gmail.com>
SuperQ added a commit that referenced this issue Jul 14, 2024
* fix pressure metric collection fails on systems that do not expose a full CPU stat #3051 (#3054)

Signed-off-by: joey <zchengjoey@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>

* Release v1.8.2

* [BUGFIX] Fix CPU pressure metric collection #3054

Signed-off-by: Ben Kochie <superq@gmail.com>

---------

Signed-off-by: joey <zchengjoey@gmail.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
Co-authored-by: chengjoey <30427474+chengjoey@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug platform/Linux Linux specific issue
Projects
None yet
2 participants