Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add container metrics fields from ECS #87

Closed
wants to merge 2 commits into from

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Jun 6, 2023

This PR adds container related metrics fields as part of #72.

cc: @AlexanderWert @kaiyan-sheng @mlunadia

@ChrsMark ChrsMark changed the title Add container metrics fields from ECS [WIP] Add container metrics fields from ECS Jun 6, 2023
@ChrsMark ChrsMark changed the title [WIP] Add container metrics fields from ECS Add container metrics fields from ECS Jun 19, 2023
@ChrsMark ChrsMark force-pushed the add_container_metrics branch 3 times, most recently from 995ab45 to 1076e36 Compare June 19, 2023 09:33
@ChrsMark ChrsMark marked this pull request as ready for review June 19, 2023 09:34
@ChrsMark ChrsMark requested review from a team June 19, 2023 09:34
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>

## HTTP Server

### Metric: `metric.container.cpu.usage`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name should just be container.cpu.usage

The metric. classifier is for our YAML database, not the metric name. (Sorry for confusion)

Same comment for all other metrics.

groups:
- id: metric.container.cpu.usage
type: metric
metric_name: container.cpu.usage
Copy link
Member

@trask trask Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed separate from system.cpu.*?

(there was some related discussion in open-telemetry/opentelemetry-specification#2388)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank's @trask! That's interesting one and it's quite similar to open-telemetry/opentelemetry-specification#2388 (comment).

I will try to share my view on this :).
So, from infrastructure's observability point of view, container metrics would be collected from outside the containers themselves (as a best practice).
For example by using the Docker runtime's APIs or Kubelet's API's.

So the point here is that it does not make a lot of sense to treat a container as a host and collect its metrics by running a collector inside the container.

So for metrics collected through the runtime/orchastrator APIs I think we need to be specific and have a resource specific namespace like container.cpu, pod.cpu etc.

In many cases the CPU/Memory resources are limited as well:

In very specific usecases, if someone wants to treat a container as a host (kind uses containers' as hosts) then the collector should be running on the container/host directly and report metrics under the system.* namespace. In that case the resource is a "host" not a container from the observation/collection point of view.

Also related to #226 (comment).

on all network interfaces
by the container since
the last metric collection.
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on all network interfaces
by the container since
the last metric collection.
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be a counter?

The total number of bytes written
successfully (aggregated from all disks)
since the last metric collection
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total number of bytes read
successfully (aggregated from all disks)
since the last metric collection.
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here: it's a counter

@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 24, 2023

Closing in honor of #282. After the project restructuring it was hard to rebase and recover this branch due to some weird permission errors. Starting fresh was faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants