[receiver/k8scluster] Consider adding metrics to get effective pod requests/limits #29860

jinja2 · 2023-12-13T17:53:45Z

Component(s)

receiver/k8scluster

Is your feature request related to a problem? Please describe.

The k8scluster receiver currently provides metrics for resource requests and limits for containers. For most use cases the effective pod resource requirements end up being equal to the sum of the request/limit of all main containers in the pod. But k8s components like scheduler, kubelet, etc. use a more complicated calculation for the effective resource requirement for running a pod. For k8s version without the sidecar and in-place resize feature, the effective request/limit for a resource is calculated as max ( max(init containers), sum(containers) ) + pod_overhead. The full algorithm which also takes into account the nuances of additional feature for latest k8s version can be found here.

For e.g. I have a pod in the screenshot with initContainer set to cpu requests (150m) > cpu request of the main container (100m). And you can see kubectl describe node output shows the reserved cpu on the node for the pod is 150m.

An admin might want to monitor patterns like this when pods end up reserving resources for initialization that are not used during the life of the pod. Being able to track the effective pod request/limit is useful when trying to track the capacity of the node as seen by the scheduler.

Describe the solution you'd like

The easiest and most accurate way to get the effective pod req/limit is by scraping the metrics kube_pod_resource_request and kube_pod_resource_limit from kube-scheduler but this might not be an option for users with managed clusters.

The receiver should have the option to collect request/limit for initContainers and the pod overhead.

We could additionally discuss the feasibility of computing the effective pod request/limit in the receiver the same way the scheduler does which might be difficult to implement and maintain, since the receiver won’t have access to the enabled k8s feature gates like the scheduler, and we’ll need to keep the computations in the receiver in-sync with changes to k8s.

Proposed new metrics for pod overhead -

k8s.pod.cpu_overhead, additional attr k8s.pod.runtimeclass
k8s.pod.memory_overhead - additional attr k8s.pod.runtimeclass

For the request/limits for init containers, I think it makes sense to differentiate these metrics from those for main containers since users might want to filter out init containers. We could either change the metric name to reflect the different types of container in a pod e.g. k8s.initcontainer.* or add an attr like k8s.container.type. Having separate metric name seems better for user because these can be enabled/disabled in the receiver config interface easily.

Additional consideration when naming the metrics would be the new sidecar-type initContainer metrics being discussed in this issue.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2023-12-13T17:54:01Z

Pinging code owners:

receiver/k8scluster: @dmitryax @TylerHelmuth @povilasv

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-02-12T03:31:04Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

receiver/k8scluster: @dmitryax @TylerHelmuth @povilasv

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-04-12T05:20:17Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

jinja2 added enhancement New feature or request needs triage New item requiring triage labels Dec 13, 2023

github-actions bot added the receiver/k8scluster label Dec 13, 2023

TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels Dec 13, 2023

github-actions bot mentioned this issue Dec 19, 2023

Weekly Report: 2023-12-12 - 2023-12-19 #30067

Closed

github-actions bot added the Stale label Feb 12, 2024

github-actions bot added the closed as inactive label Apr 12, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/k8scluster] Consider adding metrics to get effective pod requests/limits #29860

[receiver/k8scluster] Consider adding metrics to get effective pod requests/limits #29860

jinja2 commented Dec 13, 2023 •

edited

Loading

github-actions bot commented Dec 13, 2023

github-actions bot commented Feb 12, 2024

github-actions bot commented Apr 12, 2024

[receiver/k8scluster] Consider adding metrics to get effective pod requests/limits #29860

[receiver/k8scluster] Consider adding metrics to get effective pod requests/limits #29860

Comments

jinja2 commented Dec 13, 2023 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Dec 13, 2023

github-actions bot commented Feb 12, 2024

github-actions bot commented Apr 12, 2024

jinja2 commented Dec 13, 2023 •

edited

Loading