Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change process_runtime_heap_alloc_bytes metric to include memory ballast size #8342

Closed
mx-psi opened this issue Aug 31, 2023 · 8 comments · Fixed by #10696
Closed

Change process_runtime_heap_alloc_bytes metric to include memory ballast size #8342

mx-psi opened this issue Aug 31, 2023 · 8 comments · Fixed by #10696
Labels
collector-telemetry healthchecker and other telemetry collection issues enhancement New feature or request

Comments

@mx-psi
Copy link
Member

mx-psi commented Aug 31, 2023

The process/runtime/heap_alloc_bytes/process_runtime_heap_alloc_bytes metric is defined as

Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')

but, since #45, the actual value is runtime.MemStats.HeapAlloc minus the memory ballast size if the memory_ballast extension is being used:

func (pm *processMetrics) readMemStatsIfNeeded() {
now := time.Now()
// If last time we read was less than one second ago just reuse the values
if now.Sub(pm.lastMsRead) < time.Second {
return
}
pm.lastMsRead = now
runtime.ReadMemStats(pm.ms)
if pm.ballastSizeBytes > 0 {
pm.ms.Alloc -= pm.ballastSizeBytes
pm.ms.HeapAlloc -= pm.ballastSizeBytes

This delays the process metrics initialization until extensions are running:

if err = proctelemetry.RegisterProcessMetrics(srv.telemetryInitializer.ocRegistry, srv.telemetryInitializer.mp, obsreportconfig.UseOtelForInternalMetricsfeatureGate.IsEnabled(), getBallastSize(srv.host)); err != nil {

We should remove this special case and change the metric to report the raw value of runtime.MemStats.HeapAlloc instead, which will simplify both the metric's semantics as well as the implementation of telemetry initialization. If we keep using the memory ballast extension (see the separate issue #8343 for discussion on this, which may happen after this change), we can add a metric to the memory_ballast extension to indicate the memory ballast size and be able to recover the previous value reported by this metric.

If we find out it is not desirable to just report runtime.MemStats.HeapAlloc, we should instead change the metric description to state that the memory ballast value is removed.

@mx-psi mx-psi added enhancement New feature or request collector-telemetry healthchecker and other telemetry collection issues labels Aug 31, 2023
@mx-psi
Copy link
Member Author

mx-psi commented Aug 31, 2023

Note that this was explicitly mentioned in the original #45 PR (emphasis mine)

Updated process telemetry code to account for the ballast and report
memory usage statistics after subtracting the ballast size. I'm not sure
if this makes sense in every scenario. If not, we can add another flag
to disable the behaviour. Another option is to add one more metric that
just exports the memory ballast size as a static number and then let
metric consumers to adjust the numbers.

@codeboten
Copy link
Contributor

This change would make sense to me. How do you propose introducing the change? I'd expect anyone that monitors using this value today will be surprised to see that value change by the ballast size after a release.

@mx-psi
Copy link
Member Author

mx-psi commented Aug 31, 2023

This change would make sense to me. How do you propose introducing the change? I'd expect anyone that monitors using this value today will be surprised to see that value change by the ballast size after a release.

We can add the memory_ballast metric without breaking anyone. For the process_runtime_heap_alloc_bytes, we can do the change through a feature gate to allow users to migrate at their own pace.

@codeboten
Copy link
Contributor

Sounds good. Pinging @open-telemetry/collector-approvers for more eyes on this

@dmitryax
Copy link
Member

Why do we need to tackle this before the potential removal of the memory_ballast? I think doing it implicitly for user is pretty tricky.

I'd suggest not touching it until we decide on the removal of the memory_ballast. I hope we will be able to remove the memory ballast. In that case, the transition will be pretty obvious for the user:

  1. memory_ballast is deprecated: users explicitly remove it and get process_runtime_heap_alloc_bytes updated
  2. memory_ballast is removed: users have to remove it from their config by this time and get process_runtime_heap_alloc_bytes updated

@codeboten
Copy link
Contributor

codeboten commented Aug 31, 2023

@dmitryax right I guess this is currently blocked on the decision out of #8343.

I suppose if we decided not to remove the ballast, we could still decide to do this change w/ feature gates, but i agree that it lines up better if we can remove the ballast altogether

@mx-psi
Copy link
Member Author

mx-psi commented Sep 1, 2023

Why do we need to tackle this before the potential removal of the memory_ballast? I think doing it implicitly for user is pretty tricky.

Let's discuss #8343 first, but my expectation would be that removing the memory ballast is going to be a lengthy process given how widely it is used, and I don't want to block doing this refactor on the ballast removal because

  1. this issue is relevant for the latest approach I am trying with Allow overriding telemetry providers #4970
  2. I would want this issue to happen even if we don't remove the memory ballast or don't do Allow overriding telemetry providers #4970; the current metric description is wrong and we privilege this ballast extension implementation above other future approaches for no good reason other than the historical one.

@mx-psi
Copy link
Member Author

mx-psi commented Sep 13, 2023

We can wait for open-telemetry/opentelemetry-helm-charts/issues/891, if we can remove the extension 'quickly' I think we can avoid doing this in two steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collector-telemetry healthchecker and other telemetry collection issues enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants