-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard system metrics and semantic conventions #119
Standard system metrics and semantic conventions #119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It may also be worth defining the data type (Int64 or Double)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien from New Relic - I work on our infrastructure product and I have a couple of comments / questions. Sorry if they are obvious, I'm still getting up to speed with OTEL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. I especially like "usage" and "utilization" as standard names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely good enough to be approved as an OTEP and move on to the spec itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Update kubeletstats receiver metrics according to open-telemetry/oteps#119 : - Remove "k8s." from container metrics - Change "/" metric delimiter to "." - Change ...network -> ...network.io - Avoid short forms in metrics: "mem" -> "memory", "fs" -> "filesystem" - Use "cpu.usage" in [0,1] scale instead of "cpu.utilization" in percents - Use "cpu.time" in seconds instead of "cpu/cumulative" in nanoseconds Also introduce additional metrics: "network.errors" and "filesystem.usage".
Update kubeletstats receiver metrics according to open-telemetry/oteps#119 : - Remove "k8s." from container metrics - Change "/" metric delimiter to "." - Change ...network -> ...network.io - Avoid short forms in metrics: "mem" -> "memory", "fs" -> "filesystem" - Use "cpu.usage" in [0,1] scale instead of "cpu.utilization" in percents - Use "cpu.time" in seconds instead of "cpu/cumulative" in nanoseconds Also introduce additional metrics: "network.errors" and "filesystem.usage".
@open-telemetry/specs-metrics-approvers please review. |
I believe this PR is ready to be merged but when writing this up for the specs repo, it would be good to add a convention for process counts (with "state" = running / inactive) |
|----------------------|-------|-----------------|----------|---------|-----------------------------------| | ||
|system.cpu.time |seconds|SumObserver |Double |state |idle, user, system, interrupt, etc.| | ||
| | | | |cpu |1 - #cores | | ||
|system.cpu.utilization|1 |UpDownSumObserver|Double |state |idle, user, system, interrupt, etc.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/UpDownSumObserver/ValueObserver
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
Conventions from [OTEP 119](open-telemetry/oteps#119)
* System metrics semantic conventions Conventions from [OTEP 119](open-telemetry/oteps#119) * change process count to UpDownSumObserver * fix system.cpu.utilization, use better example * first several comments * add description columns, update units to UCUM * markdown-toc * clarify OS process level metrics * clarify load average exapmle * move general conventions + OTEP 108 into README.md * renamed swap -> paging * add addition fs labels * fix links * fix link * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * fix tigran comments * add disk io_time and operation_time * add descriptions/footnotes for dropped packets and net errors * lint, more info for net dropped packets/errors * "dropped_packets" -> "dropped" * Apply suggestions from James' code review Co-authored-by: James Bebbington <jbebbington@google.com> * comments from James' code review * clarify windows perf counter * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> * reflow text Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: James Bebbington <jbebbington@google.com> Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
* System metrics semantic conventions Conventions from [OTEP 119](open-telemetry/oteps#119) * change process count to UpDownSumObserver * fix system.cpu.utilization, use better example * first several comments * add description columns, update units to UCUM * markdown-toc * clarify OS process level metrics * clarify load average exapmle * move general conventions + OTEP 108 into README.md * renamed swap -> paging * add addition fs labels * fix links * fix link * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * fix tigran comments * add disk io_time and operation_time * add descriptions/footnotes for dropped packets and net errors * lint, more info for net dropped packets/errors * "dropped_packets" -> "dropped" * Apply suggestions from James' code review Co-authored-by: James Bebbington <jbebbington@google.com> * comments from James' code review * clarify windows perf counter * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> * reflow text Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: James Bebbington <jbebbington@google.com> Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
* System metrics semantic conventions Conventions from [OTEP 119](open-telemetry/oteps#119) * change process count to UpDownSumObserver * fix system.cpu.utilization, use better example * first several comments * add description columns, update units to UCUM * markdown-toc * clarify OS process level metrics * clarify load average exapmle * move general conventions + OTEP 108 into README.md * renamed swap -> paging * add addition fs labels * fix links * fix link * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * fix tigran comments * add disk io_time and operation_time * add descriptions/footnotes for dropped packets and net errors * lint, more info for net dropped packets/errors * "dropped_packets" -> "dropped" * Apply suggestions from James' code review Co-authored-by: James Bebbington <jbebbington@google.com> * comments from James' code review * clarify windows perf counter * Update specification/metrics/semantic_conventions/README.md Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> * reflow text Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: James Bebbington <jbebbington@google.com> Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
See open-telemetry/opentelemetry-specification#651. This OTEP proposes some standard system metric names as well as semantic conventions for naming system/runtime metrics. This mostly follows the work done in #108 and the Collector. I left a few TODOs and open questions, the biggest things being standard runtime metrics and process metrics.