-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helper functions for metric conversion [awsecscontainermetricsreceiver] #1089
Changes from 4 commits
ba6ee8b
8d991e3
88cd8ca
d6de8ca
0e8cba2
91dac3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
// Copyright 2020, OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package awsecscontainermetrics | ||
|
||
import ( | ||
"time" | ||
|
||
metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1" | ||
resourcepb "github.com/census-instrumentation/opencensus-proto/gen-go/resource/v1" | ||
"go.opentelemetry.io/collector/consumer/consumerdata" | ||
) | ||
|
||
// metricDataAccumulator defines the accumulator | ||
type metricDataAccumulator struct { | ||
md []*consumerdata.MetricsData | ||
} | ||
|
||
// getMetricsData generates OT Metrics data from task metadata and docker stats | ||
func (acc *metricDataAccumulator) getMetricsData(containerStatsMap map[string]ContainerStats, metadata TaskMetadata) { | ||
|
||
taskMetrics := ECSMetrics{} | ||
timestamp := timestampProto(time.Now()) | ||
taskResources := taskResources(metadata) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated. |
||
|
||
for _, containerMetadata := range metadata.Containers { | ||
stats := containerStatsMap[containerMetadata.DockerID] | ||
containerMetrics := getContainerMetrics(stats) | ||
containerMetrics.MemoryReserved = *containerMetadata.Limits.Memory | ||
containerMetrics.CPUReserved = *containerMetadata.Limits.CPU | ||
|
||
containerResources := containerResources(containerMetadata) | ||
for k, v := range taskResources.Labels { | ||
containerResources.Labels[k] = v | ||
} | ||
|
||
acc.accumulate( | ||
containerResources, | ||
convertToOCMetrics(ContainerPrefix, containerMetrics, nil, nil, timestamp), | ||
) | ||
|
||
aggregateTaskMetrics(&taskMetrics, containerMetrics) | ||
} | ||
|
||
// Overwrite Memory limit with task level limit | ||
if metadata.Limits.Memory != nil { | ||
taskMetrics.MemoryReserved = *metadata.Limits.Memory | ||
} | ||
|
||
taskMetrics.CPUReserved = taskMetrics.CPUReserved / CPUsInVCpu | ||
|
||
// Overwrite CPU limit with task level limit | ||
if metadata.Limits.CPU != nil { | ||
taskMetrics.CPUReserved = *metadata.Limits.CPU | ||
} | ||
|
||
acc.accumulate( | ||
taskResources, | ||
convertToOCMetrics(TaskPrefix, taskMetrics, nil, nil, timestamp), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the 3rd and 4th parameters to this method are always nil, I would remove those parameters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also thought about it while writing this piece of code. Here, I kept the skeleton ready and the same method can be utilized to set metric labels. In our next PRs, we can just pass the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 SGTM |
||
) | ||
} | ||
|
||
func (acc *metricDataAccumulator) accumulate( | ||
r *resourcepb.Resource, | ||
m ...[]*metricspb.Metric, | ||
) { | ||
var resourceMetrics []*metricspb.Metric | ||
for _, metrics := range m { | ||
for _, metric := range metrics { | ||
if metric != nil { | ||
resourceMetrics = append(resourceMetrics, metric) | ||
} | ||
} | ||
} | ||
|
||
r.Labels[ResourceAttributeServiceNameKey] = ResourceAttributeServiceNameValue | ||
|
||
acc.md = append(acc.md, &consumerdata.MetricsData{ | ||
Metrics: resourceMetrics, | ||
Resource: r, | ||
}) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
// Copyright 2020, OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package awsecscontainermetrics | ||
|
||
import ( | ||
"testing" | ||
|
||
"github.com/stretchr/testify/require" | ||
"go.opentelemetry.io/collector/consumer/consumerdata" | ||
) | ||
|
||
func TestGetMetricsData(t *testing.T) { | ||
v := uint64(1) | ||
f := float64(1.0) | ||
|
||
memStats := make(map[string]uint64) | ||
memStats["cache"] = v | ||
|
||
mem := MemoryStats{ | ||
Usage: &v, | ||
MaxUsage: &v, | ||
Limit: &v, | ||
MemoryReserved: &v, | ||
MemoryUtilized: &v, | ||
Stats: memStats, | ||
} | ||
|
||
disk := DiskStats{ | ||
IoServiceBytesRecursives: []IoServiceBytesRecursive{ | ||
{Op: "Read", Value: &v}, | ||
{Op: "Write", Value: &v}, | ||
{Op: "Total", Value: &v}, | ||
}, | ||
} | ||
|
||
net := make(map[string]NetworkStats) | ||
net["eth0"] = NetworkStats{ | ||
RxBytes: &v, | ||
RxPackets: &v, | ||
RxErrors: &v, | ||
RxDropped: &v, | ||
TxBytes: &v, | ||
TxPackets: &v, | ||
TxErrors: &v, | ||
TxDropped: &v, | ||
} | ||
|
||
netRate := NetworkRateStats{ | ||
RxBytesPerSecond: &f, | ||
TxBytesPerSecond: &f, | ||
} | ||
|
||
percpu := []*uint64{&v, &v} | ||
cpuUsage := CPUUsage{ | ||
TotalUsage: &v, | ||
UsageInKernelmode: &v, | ||
UsageInUserMode: &v, | ||
PerCPUUsage: percpu, | ||
} | ||
|
||
cpuStats := CPUStats{ | ||
CPUUsage: cpuUsage, | ||
OnlineCpus: &v, | ||
SystemCPUUsage: &v, | ||
CPUUtilized: &v, | ||
CPUReserved: &v, | ||
} | ||
containerStats := ContainerStats{ | ||
Name: "test", | ||
ID: "001", | ||
Memory: mem, | ||
Disk: disk, | ||
Network: net, | ||
NetworkRate: netRate, | ||
CPU: cpuStats, | ||
} | ||
|
||
tm := TaskMetadata{ | ||
Cluster: "cluster-1", | ||
TaskARN: "arn:aws:some-value/001", | ||
Family: "task-def-family-1", | ||
Revision: "task-def-version", | ||
Containers: []ContainerMetadata{ | ||
{ContainerName: "container-1", DockerID: "001", DockerName: "docker-container-1", Limits: Limit{CPU: &f, Memory: &v}}, | ||
}, | ||
Limits: Limit{CPU: &f, Memory: &v}, | ||
} | ||
|
||
cstats := make(map[string]ContainerStats) | ||
cstats["001"] = containerStats | ||
|
||
var mds []*consumerdata.MetricsData | ||
acc := metricDataAccumulator{ | ||
md: mds, | ||
} | ||
|
||
acc.getMetricsData(cstats, tm) | ||
require.Less(t, 0, len(acc.md)) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,49 @@ const ( | |
AttributeECSTaskRevesion = "ecs.task-definition-version" | ||
AttributeECSServiceName = "ecs.service" | ||
|
||
ContainerMetricsLabelLen = 3 | ||
TaskMetricsLabelLen = 6 | ||
CPUsInVCpu = 1024 | ||
BytesInMiB = 1024 * 1024 | ||
|
||
TaskPrefix = "ecs.task." | ||
ContainerPrefix = "container." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need to prefix the metrics with container? If they have container label, they're container metrics right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this part of the OTel convention but given that other receivers follow this approach, I think we should do the same here for consistency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I found some other receivers are doing the same like |
||
ResourceAttributeServiceNameKey = "service.name" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this? |
||
ResourceAttributeServiceNameValue = "awsecscontainermetrics" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The service name corresponds with an application, not a backend, so for example There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, got it. But, for all of our AWS receivers, we are using the receiver name as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by AWS receivers? I think the only one we have is xray, which doesn't do this, and definitely shouldn't since we need to make sure the app's service name is used. I'm not sure what you mean exactly by the rules, but anyways we can't just fill a semantic convention attribute with something that doesn't follow the spec. If anything, the telemetry.sdk matches closer to what this sort of receiver is doing. @bogdandrutu @tigrannajaryan any suggestion on that? Also @hossain-rayhan it's important to take a step back and remember what this receiver is here for - it's to translate the container metrics data into the OpenTelemetry format / specification. This is because this data seems useful to users regardless of if they use cloudwatch or not. While we may need some, but hopefully not much, consideration for specific vendors like cloudwatch, that's not the intent here. If you haven't yet, you should go through in detail at least the Resource and Metrics semantics conventions of OTel spec before proceeding and make sure you are aligned with it https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource. That doesn't mean we want to block data that's needed, but it's important to follow the spec as much as possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This receiver generates ECS container Metrics itself but not receiving any metrics from outside of OTel Collector. For the metrics generated inside the receiver, the idea is to put receiver name in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1. We should not use "service.name" for receiver name. That is not the purpose of "service.name". "service.name" is supposed to describe the source that emits the metrics. Collector is just collecting the metric, it is an intermediary, it is not the source. Nor is "telemetry.sdk" intended for that. The source that emits the metrics is the container here. If we know the name of the service that runs in the container we should set that. If we don't know we should not record it at all. I do not know why we want to record the receiver name, perhaps you can clarify the use case. This can then be added as a semantic convention for OpenTelemetry as a whole or just for the Collector and will possibly end up in the "otel" namespace. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hossain-rayhan If you can give some more detail about this usage that would be great. I think filling in wrong information is blocking this PR, so the easiest way to proceed would be to just remove setting the service name for now and we can figure out a way to handle what you need in a separate PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @anuraaga . I am removing it for now as this should not block the receiver. This is more related to the exporter logic as it's being utilized for supporting special customer use cases. If needed I can support it in a separate PR after further discussion. |
||
MetricResourceType = "aoc.ecs" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, what does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AWS Observability Collector-> Amazon distribution of OpenTelemetry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, resource type doesn't exist in OTel protocol but is there right now for metrics since it still seems to use opencensus. So this value will generally be dropped There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. Even if it gets utilized we want to use "aoc.ecs" to differentiate our OT metrics from ECS backend metrics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I point it out because it will go away - so if you do have an expectation of having it, it won't be there :P But I think if we put the receiver name in something like |
||
|
||
AttributeMemoryUsage = "memory.usage" | ||
AttributeMemoryMaxUsage = "memory.usage.max" | ||
AttributeMemoryLimit = "memory.usage.limit" | ||
AttributeMemoryReserved = "memory.reserved" | ||
AttributeMemoryUtilized = "memory.utilized" | ||
|
||
AttributeCPUTotalUsage = "cpu.usage.total" | ||
AttributeCPUKernelModeUsage = "cpu.usage.kernelmode" | ||
AttributeCPUUserModeUsage = "cpu.usage.usermode" | ||
AttributeCPUSystemUsage = "cpu.usage.system" | ||
AttributeCPUCores = "cpu.cores" | ||
AttributeCPUOnlines = "cpu.onlines" | ||
AttributeCPUReserved = "cpu.reserved" | ||
AttributeCPUUtilized = "cpu.utilized" | ||
|
||
AttributeNetworkRateRx = "network.rate.rx" | ||
AttributeNetworkRateTx = "network.rate.tx" | ||
|
||
AttributeNetworkRxBytes = "network.io.usage.rx_bytes" | ||
AttributeNetworkRxPackets = "network.io.usage.rx_packets" | ||
AttributeNetworkRxErrors = "network.io.usage.rx_errors" | ||
AttributeNetworkRxDropped = "network.io.usage.rx_dropped" | ||
AttributeNetworkTxBytes = "network.io.usage.tx_bytes" | ||
AttributeNetworkTxPackets = "network.io.usage.tx_packets" | ||
AttributeNetworkTxErrors = "network.io.usage.tx_errors" | ||
AttributeNetworkTxDropped = "network.io.usage.tx_dropped" | ||
|
||
AttributeStorageRead = "storage.read_bytes" | ||
AttributeStorageWrite = "storage.write_bytes" | ||
|
||
UnitBytes = "Bytes" | ||
UnitMegaBytes = "MB" | ||
UnitNanoSecond = "NS" | ||
UnitBytesPerSec = "Bytes/Sec" | ||
UnitCount = "Count" | ||
UnitVCpu = "vCPU" | ||
) |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asuresh4 @bogdandrutu Are metrics receivers still using opencensus proto?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we changed most of the core to use the otlp and internal structs. Completely recommend for new components to avoid oc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hossain-rayhan you need to start using
pdata.Metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bogdandrutu , before sending the data to next consumer I am using
internaldata.OCToMetrics(md)
to convert our metrics topdata.Metrics
. Wondering, isn't that enough like other receivers in the repo or we should strictly get rid of it now?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a temporary solution to make progress and not have to change all components once. And decided to use that for some old components that we did not have time to chnage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hossain-rayhan Yeah so basically you should only be converting at the last moment when passing down, but here in this sort of receiver-specific logic we want to be using
pdata
, the OTel format. Or we just have to rewrite it right away. We're also having data-model issues because of using the old format (Resource type for example) and we want to make sure the model is rightThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bogdandrutu and @anuraaga. I understand we need to use
pdata
to convert everything to OTel format eventually. I was planning to move forward with this to meet our internal deadline (9/30/2020). We can send a different PR after October 15th I guess. How do you guys feel about it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you create an issue and assign to you and @anuraaga I am fine. I trust that you will fix it. I will let @anuraaga make the final call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue created: #1122