Metric data exported repeatedly #832

bvwells · 2024-04-15T15:38:45Z

We recently performed OpenTelemetry and opentelemetry-operations-go module upgrades and found an issue with the export of metric data (it may be that we have done something wrong with the upgrade so apologies if this is the case).

I've tried to illustrate the issue here:

https://github.com/bvwells/opentelemetry-issue

In this git repository I have two examples which generate a singe histogram metric with varying versions of OpenTelemetry modules. The legacy folder contains an application running:

github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.32.7
go.opentelemetry.io/otel v1.15.1
go.opentelemetry.io/otel/metric v0.34.0
go.opentelemetry.io/otel/sdk v1.15.1
go.opentelemetry.io/otel/sdk/metric v0.31.0

The latest folder contains an application running:

github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.46.0
go.opentelemetry.io/otel v1.25.0
go.opentelemetry.io/otel/metric v1.25.0
go.opentelemetry.io/otel/sdk/metric v1.25.0

There are more details in the README here https://github.com/bvwells/opentelemetry-issue/blob/main/README.md.

The behaviour when exporting to Google Cloud operations is different between the versions. The older version of the exporter only exports time-series data to the Google Cloud operations API once, whereas the new version of the exporter is exporting the same time-series data every time the exporter runs (every minute). This means that the ingested bytes into Google Cloud operations increases over time irrespective of whether any new metric data has been recorded.

When running in our application we see its memory footprint increase over time as metric data is kept in memory and the ingested bytes into Google Cloud operations increases drastically.

Please let me know if you require any further information on this issue.

The text was updated successfully, but these errors were encountered:

dashpole · 2024-04-15T15:58:24Z

Where do the print statements come from?

Histograms in cloud monitoring are cumulative, so exporting the current state of the histogram each export interval is expected. I'm not sure why the older version was not doing this.

When running in our application we see its memory footprint increase over time as metric data is kept in memory and the ingested bytes into Google Cloud operations increases drastically.

This sounds like a bug. We should not be accumulating increasing amounts of metric data over time.

bvwells · 2024-04-15T16:06:10Z

I've vendored the modules for each example and manually added some logging statements to the vendored code. See

https://github.com/bvwells/opentelemetry-issue/blob/3dea93c18e76f50197296c90b440b958579db127/latest/vendor/github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric/metric.go#L232

and

https://github.com/bvwells/opentelemetry-issue/blob/3dea93c18e76f50197296c90b440b958579db127/legacy/vendor/github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric/metric.go#L260

dashpole · 2024-04-15T16:22:11Z

v0.31.0 of the metrics SDK was before it was completely re-written to comply with the specification (see https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md#0320-revised-metric-sdk-alpha---2022-09-18). I suspect the behavior changed with the re-write in open-telemetry/opentelemetry-go#3175.

The exporter (new and old) should be stateless. It should simply translate from the OpenTelemetry types to the Google Cloud Monitoring types. You can verify that by using the stdout exporter, or by adding a print statement inside the ForEach loop of the older exporter: https://github.com/bvwells/opentelemetry-issue/blob/3dea93c18e76f50197296c90b440b958579db127/legacy/vendor/github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric/metric.go#L264.

I suspect the pre-rewrite metrics SDK did not properly implement cumulative histogram export, and that it was fixed in the stable version of the SDK.

What makes you think it is leaking memory?

bvwells · 2024-04-15T17:17:18Z

Thanks for looking into the issue.

The increase in memory is caused due to us recording some metrics with quite high cardinality. Previously the data was only kept in memory for an export cycle, however now it keeps all data in memory for all time. I don't think it is a memory leak.

It has now become unfeasible to record this type of metric data in Google Cloud Operations due to this as the ingested bytes has exploded in magnitude and with it the cost (as the pricing model is based on ingested bytes). We had been able to ingest this type of metric into Google Cloud Operations using OpenCensus and previously versions OpenTelemetry. The cost for lower cardinality metrics has also increased significantly.

Technically it feels a little excessive that all time-series data is pushed at every export cycle rather than only time-series which has changed, but it sounds like this would be a large design change....

dashpole · 2024-04-15T17:38:48Z

Got it. It definitely isn't recommended to use deltas with GCM (IIRC, it negatively impacts query performance), but if you've been happy with the existing experience, I think it is possible to work around this to get the SDK to send deltas. GCM doesn't really support Deltas, but we can send cumulatives that reset every interval.

You currently have:

	exporter, err := metric.New(
		metric.WithProjectID(*projectID),
		metric.WithMetricDescriptorTypeFormatter(func(m metricdata.Metrics) string {
			return fmt.Sprintf("custom.googleapis.com/opentelemetry/%s", m.Name)
		}))
	if err != nil {
		log.Printf("failed to create the exporter: %v", err)
		os.Exit(-1)
	}

You would add something like:

// deltaExporter overrides the Temporality of the Exporter with delta.
type deltaExporter struct {
        otelsdkmetric.Exporter
}

// Override the exporter's temporality function
func (d *deltaExporter) Temporality(ik metric.InstrumentKind) metricdata.Temporality {
        // Use delta temporality for all instruments
        return metricdata.DeltaTemporality
}

Then, after your setup code above

        // Current setup code ...

        // Wrap the exporter to make the SDK give it deltas instead
        exporter = &deltaExporter{Exporter: exporter}

I'm not 100% sure it will work properly, but it sounds like this was the behavior in the older version of the OTel SDK.

dashpole · 2024-04-15T17:44:34Z

The only other thing i'll add is that if cost is a concern, it may be worth looking into Google Managed service for Prometheus (GMP). It is much, much less expensive. You can use the Prometheus exporter from OTel-go with managed GMP collection, or use the googlemanagedprometheus exporter on the collector, but we don't have one that exports directly from the application.

bvwells · 2024-04-15T17:57:21Z

Thanks for the suggestions and for your analysis. I'll do some playing! :-)

We already use Prometheus in some of the cloud environments we deploy our application to. I suspect that this may push us to fully adopt it in all environments.

dashpole · 2024-04-15T18:03:43Z

I'm going to close this, but feel free to reopen if you have any other questions

dashpole self-assigned this Apr 15, 2024

dashpole added bug Something isn't working priority: p1 labels Apr 15, 2024

dashpole added question Further information is requested priority: p3 and removed bug Something isn't working priority: p1 labels Apr 15, 2024

dashpole closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric data exported repeatedly #832

Metric data exported repeatedly #832

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

dashpole commented Apr 15, 2024 •

edited

Loading

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

Metric data exported repeatedly #832

Metric data exported repeatedly #832

Comments

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

dashpole commented Apr 15, 2024 • edited Loading

bvwells commented Apr 15, 2024

dashpole commented Apr 15, 2024

dashpole commented Apr 15, 2024 •

edited

Loading