Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading recording rule for cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate #9005

Open
marybelvargas opened this issue Aug 14, 2024 · 0 comments
Labels

Comments

@marybelvargas
Copy link

Describe the bug

The original rule does not filter by container or image so for people who didn’t drop the “total” in the scrape initially the result would be double of the actual usage.

If we look at the memory one it does have the filter {image!=""} which is more standard. So the calculation is more accurate.

As both cpu and memory here will be coming from the same job (cadvisor), the recording rule should be consistent. Either have the filter in place for both calculation, or let people know that they should deal with this at scrape time.

Current definition is:

sum by (cluster, namespace, deployment) (
  label_replace(
    label_replace(
      sum by (cluster, namespace, pod)(rate(container_cpu_usage_seconds_total[1m])),
      "deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
    ),
    # The question mark in "(.*?)" is used to make it non-greedy, otherwise it
    # always matches everything and the (optional) zone is not removed.
    "deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
  )
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants