-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric data exported repeatedly #832
Comments
Where do the print statements come from? Histograms in cloud monitoring are cumulative, so exporting the current state of the histogram each export interval is expected. I'm not sure why the older version was not doing this.
This sounds like a bug. We should not be accumulating increasing amounts of metric data over time. |
I've vendored the modules for each example and manually added some logging statements to the vendored code. See and |
v0.31.0 of the metrics SDK was before it was completely re-written to comply with the specification (see https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md#0320-revised-metric-sdk-alpha---2022-09-18). I suspect the behavior changed with the re-write in open-telemetry/opentelemetry-go#3175. The exporter (new and old) should be stateless. It should simply translate from the OpenTelemetry types to the Google Cloud Monitoring types. You can verify that by using the stdout exporter, or by adding a print statement inside the ForEach loop of the older exporter: https://github.com/bvwells/opentelemetry-issue/blob/3dea93c18e76f50197296c90b440b958579db127/legacy/vendor/github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric/metric.go#L264. I suspect the pre-rewrite metrics SDK did not properly implement cumulative histogram export, and that it was fixed in the stable version of the SDK. What makes you think it is leaking memory? |
Thanks for looking into the issue. The increase in memory is caused due to us recording some metrics with quite high cardinality. Previously the data was only kept in memory for an export cycle, however now it keeps all data in memory for all time. I don't think it is a memory leak. It has now become unfeasible to record this type of metric data in Google Cloud Operations due to this as the ingested bytes has exploded in magnitude and with it the cost (as the pricing model is based on ingested bytes). We had been able to ingest this type of metric into Google Cloud Operations using OpenCensus and previously versions OpenTelemetry. The cost for lower cardinality metrics has also increased significantly. Technically it feels a little excessive that all time-series data is pushed at every export cycle rather than only time-series which has changed, but it sounds like this would be a large design change.... |
Got it. It definitely isn't recommended to use deltas with GCM (IIRC, it negatively impacts query performance), but if you've been happy with the existing experience, I think it is possible to work around this to get the SDK to send deltas. GCM doesn't really support Deltas, but we can send cumulatives that reset every interval. You currently have: exporter, err := metric.New(
metric.WithProjectID(*projectID),
metric.WithMetricDescriptorTypeFormatter(func(m metricdata.Metrics) string {
return fmt.Sprintf("custom.googleapis.com/opentelemetry/%s", m.Name)
}))
if err != nil {
log.Printf("failed to create the exporter: %v", err)
os.Exit(-1)
} You would add something like: // deltaExporter overrides the Temporality of the Exporter with delta.
type deltaExporter struct {
otelsdkmetric.Exporter
}
// Override the exporter's temporality function
func (d *deltaExporter) Temporality(ik metric.InstrumentKind) metricdata.Temporality {
// Use delta temporality for all instruments
return metricdata.DeltaTemporality
} Then, after your setup code above // Current setup code ...
// Wrap the exporter to make the SDK give it deltas instead
exporter = &deltaExporter{Exporter: exporter} I'm not 100% sure it will work properly, but it sounds like this was the behavior in the older version of the OTel SDK. |
The only other thing i'll add is that if cost is a concern, it may be worth looking into Google Managed service for Prometheus (GMP). It is much, much less expensive. You can use the Prometheus exporter from OTel-go with managed GMP collection, or use the googlemanagedprometheus exporter on the collector, but we don't have one that exports directly from the application. |
Thanks for the suggestions and for your analysis. I'll do some playing! :-) We already use Prometheus in some of the cloud environments we deploy our application to. I suspect that this may push us to fully adopt it in all environments. |
I'm going to close this, but feel free to reopen if you have any other questions |
We recently performed OpenTelemetry and opentelemetry-operations-go module upgrades and found an issue with the export of metric data (it may be that we have done something wrong with the upgrade so apologies if this is the case).
I've tried to illustrate the issue here:
https://github.com/bvwells/opentelemetry-issue
In this git repository I have two examples which generate a singe histogram metric with varying versions of OpenTelemetry modules. The legacy folder contains an application running:
The latest folder contains an application running:
There are more details in the README here https://github.com/bvwells/opentelemetry-issue/blob/main/README.md.
The behaviour when exporting to Google Cloud operations is different between the versions. The older version of the exporter only exports time-series data to the Google Cloud operations API once, whereas the new version of the exporter is exporting the same time-series data every time the exporter runs (every minute). This means that the ingested bytes into Google Cloud operations increases over time irrespective of whether any new metric data has been recorded.
When running in our application we see its memory footprint increase over time as metric data is kept in memory and the ingested bytes into Google Cloud operations increases drastically.
Please let me know if you require any further information on this issue.
The text was updated successfully, but these errors were encountered: