Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cockpit: Implement PCP metrics channel in the Python bridge #20049

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

jelly
Copy link
Member

@jelly jelly commented Feb 16, 2024

Current ToDo's

  • test failure on Arch exception
  • TF failures, create a TEST VM without python3-pcp
  • Aks PCP if they wanna upload a wheel.
  • tox
  • investigate coverage once more
  • Debian-stable/testing lack python3-pcp when running test/image-prepare -q debian-testing it fails on pytest.
  • Land this after tomorrow's 325 release

Follow ups, I'd suggest tackling:

@jelly jelly added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Feb 16, 2024
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 2 times, most recently from 2821ebd to 920768d Compare February 21, 2024 11:53
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly
Copy link
Member Author

jelly commented Jun 12, 2024

Tox is meh :( There is no pcp wheel so need to build it and the PyPi version is outdated 5.0 😞

src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 3 times, most recently from 2f49acb to e303bd2 Compare July 5, 2024 14:51
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
@jelly jelly force-pushed the cockpit-pcp-python branch 2 times, most recently from 877b872 to 34af402 Compare July 23, 2024 18:09
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
src/cockpit/channels/pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
test/pytest/test_pcp.py Fixed Show fixed Hide fixed
@jelly
Copy link
Member Author

jelly commented Sep 18, 2024

Looking again at the fetchGroup API, this looks useful but it is so freaking broken, for some reason it returns no instances for the first metric.. (why!)

#!/usr/bin/python

import cpmapi as c_api
from pcp import pmapi

from test.pytest.test_pcp import instances_archive

# archive_path = "/tmp/pytest-of-jelle/pytest-current/disk-archives0"
archive_path = "/var/log/pcp/pmlogger/t14s/20240723.index"
# archive_path = "/home/jelle/projects/cockpit/cockpit-pcp-python/testEnableNoData/archive3/20240719.09.40.index"
pmfg = pmapi.fetchgroup(c_api.PM_CONTEXT_ARCHIVE, archive_path)
context = pmfg.get_context()

# To work out:
# - omit instances
# - scale

kernel_all_load = pmfg.extend_indom("kernel.all.load", c_api.PM_TYPE_FLOAT)
mem_physmem = pmfg.extend_item("mem.physmem")
mem_avail = pmfg.extend_indom("mem.util.available", c_api.PM_TYPE_FLOAT)
swap_pagesout = pmfg.extend_indom("swap.pagesout", c_api.PM_TYPE_FLOAT)
network_interface_total_bytes = pmfg.extend_indom("network.interface.total.bytes")
t = pmfg.extend_timestamp()

for _ in range(5):
    pmfg.fetch()
    print("time: %s" % t())
    for icode, iname, value in kernel_all_load():
        print('kernel.all.load %s %s=%s' % (icode, iname, value()))
    print("mem.physmem: %s" % mem_physmem())
    print(mem_avail())
    for icode, iname, value in mem_avail():
        if iname is not None:
            print('mem.util.available %s %s=%s' % (icode, iname, value()))
    print("total.bytes %s" % network_interface_total_bytes())
    for icode, iname, value in network_interface_total_bytes():
        if iname is not None:
            print('network.interface.total.bytes %s %s=%s' % (icode, iname, value()))

Maybe something for a follow up, it seems nice

[jelle@t14s][~/projects/cockpit/cockpit-pcp-python]%python3 fetchgroup-pcp.py
time: 2024-07-23 16:05:32.873173
mem.physmem: 32620024
[]
total.bytes []
time: 2024-07-23 16:05:32.894293
kernel.all.load 1 1 minute=1.9800000190734863
kernel.all.load 5 5 minute=1.4700000286102295
kernel.all.load 15 15 minute=1.309999942779541
mem.physmem: 32620024
[(4294967295, None, <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02c00>)]
total.bytes [(0, 'lo', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02ca0>), (1, 'enp0s31f6', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02e80>), (2, 'wlp0s20f3', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02f20>), (3, 'enp0s20f0u2u6', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f02fc0>), (14, 'virbr0', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f03060>), (19, 'virbr1', <function fetchgroup.fetchgroup_indom.__call__.<locals>.<lambda>.<locals>.<lambda> at 0x715ea1f03100>)]
Traceback (most recent call last):
  File "/home/jelle/projects/cockpit/cockpit-pcp-python/fetchgroup-pcp.py", line 38, in <module>
    print('network.interface.total.bytes %s %s=%s' % (icode, iname, value()))
                                                                    ^^^^^^^
  File "/usr/lib/python3.12/site-packages/pcp/pmapi.py", line 3102, in <lambda>
    (lambda i: (lambda: decode_one(self, i)))(i)))
                        ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pcp/pmapi.py", line 3096, in decode_one
    raise pmErr(self.stss[i])
pcp.pmapi.pmErr: PM_ERR_VALUE Missing metric value(s)

@jelly jelly marked this pull request as ready for review September 18, 2024 19:27
@jelly
Copy link
Member Author

jelly commented Sep 18, 2024

@martinpitt @mvollmer an initial review would be appreciated, I am not super fond of the spaghetti soup, combining direct and archived metrics together kinda makes it messy

This is also the reason we can't easily combine the internalmetrics and the PCP metrics, the archive metrics are too different. Plus we have no Sampler() class

Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jelly , great work! I have a lot of little things, but honestly I'm not too worried about the code structure -- there is a lot of bit shifting, data type conversion, and plumbing, and I trust that all of this is necessary. But the overall structure is relatively easy to follow.

I'm really looking forward to landing this!

pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
tools/debian/rules Outdated Show resolved Hide resolved
test/pytest/test_pcp.py Outdated Show resolved Hide resolved
test/pytest/test_pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
src/cockpit/channels/pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Show resolved Hide resolved
src/cockpit/channels/pcp.py Outdated Show resolved Hide resolved
@martinpitt martinpitt changed the title Cockpit pcp python cockpit: Implement PCP metrics channel in the Python bridge Sep 19, 2024
@jelly jelly force-pushed the cockpit-pcp-python branch 7 times, most recently from fdf20b7 to 8d26edc Compare September 23, 2024 12:26
@jelly
Copy link
Member Author

jelly commented Sep 23, 2024

This seems to go green! @martinpitt I re-added the manifest as the JS code uses this to detect if PCP is available and if it's missing all PCP tests simply fail.

A follow up PR will take care of that.

@martinpitt
Copy link
Member

@jelly ack wrt. putting back the manifest. I think we agreed to let this land after tomorrow's release, then we have two weeks to do the follow-up cleanups.

Can you please take a look at the remaining open review comments such as this one? They are collapsed by default unfortunately (GitHub could really be more clever with resolved comments..)

@martinpitt
Copy link
Member

@jelly FTR, the remaining threads are mostly questions, or little cleanups. They are ok for follow-ups.

@jelly
Copy link
Member Author

jelly commented Sep 24, 2024

@jelly FTR, the remaining threads are mostly questions, or little cleanups. They are ok for follow-ups.

I hopefully handled them all and updated the PR, it wasn't too much work.

martinpitt
martinpitt previously approved these changes Sep 24, 2024
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! There was one outstanding issue about commenting the "unsupported metric" (string) case, but not important. It's here for archaeology. 😁

I retried some tests which looked like flakes. Thanks!

(Let's please land this after tomorrow's release)

@martinpitt martinpitt added the blocked Don't land until something else happens first (see task list) label Sep 24, 2024
The Python bridge still used the separate `cockpit-pcp` bridge for
metrics gathering. For us to remove the full C bridge implementation the
separate PCP bridge also has to be rewritten.

This rewrite is a more or less Python copy of the C implementation using
the Python PCP module. Even though the Python PCP module offers a
"higher level" fetchGroup API but preliminary testing has found this has
some issues with changing multi instance values and instances can not be
omitted with a fetch group (but this can be done in our own code)

Co-Authored-By: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Co-Authored-By: Tomas Matus <tomatus777@gmail.com>
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jelly
Copy link
Member Author

jelly commented Sep 24, 2024

Thanks! There was one outstanding issue about commenting the "unsupported metric" (string) case, but not important. It's here for archaeology. 😁

Added, let's not add new technical debt :)

I retried some tests which looked like flakes. Thanks!

(Let's please land this after tomorrow's release)

\o/

@martinpitt martinpitt removed the blocked Don't land until something else happens first (see task list) label Sep 25, 2024
@martinpitt martinpitt merged commit e9e1fcf into cockpit-project:main Sep 25, 2024
85 checks passed
@jelly jelly deleted the cockpit-pcp-python branch September 30, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants