cloud-controller-manager metrics port and address is not accessible or documented #912

bgagnon · 2020-01-24T16:06:11Z

Is this a BUG REPORT or FEATURE REQUEST?: Feature

/kind feature

The binaries affected:

openstack-cloud-controller-manager

What happened:

I wanted to access the internal metrics of the CCM, just like the in-tree controller-manager which previously served these metrics.

What you expected to happen:

Similar metrics (and more) as the standard controller-manager.

Anything else we need to know?:

It is not clear how to access the Prometheus metrics for the openstack-cloud-controller-manager binary. There are no documented flags or port number.

The text was updated successfully, but these errors were encountered:

fejta-bot · 2020-04-30T19:23:31Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-05-30T20:09:48Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

mikejoh · 2020-06-01T19:52:50Z

@bgagnon I've not been running the in-tree CCM and i have no idea what it exposed in regards to metrics but there's a couple of metrics you can get out of the OpenStack (external) CCM. I think there's room for improvement here, not only as far as the docs go but also code-wise. With code-wise i mean exporting more metrics (if feasible).

The OCCM is assigned port 10258, or rather the k/k one is assigned this port automatically. The OCCM imports (and vendors) the packages needed to run and behave as a external cloud provider (CCM). If that makes any sense.

I did some local tests in a one of our clusters (v1.17.0 of the OCCM) , on one of the masters i could do a curl against https://localhost:10258/metrics endpoint which returned the following metrics (only names and types i'm leaving out actual metrics):

# TYPE apiserver_audit_requests_rejected_total counter
# TYPE apiserver_client_certificate_expiration_seconds histogram
# TYPE apiserver_storage_data_key_generation_duration_seconds histogram
# TYPE apiserver_storage_data_key_generation_failures_total counter
# TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram
# TYPE apiserver_storage_envelope_transformation_cache_misses_total counter
# TYPE authenticated_user_requests counter
# TYPE authentication_attempts counter
# TYPE authentication_duration_seconds histogram
# TYPE go_gc_duration_seconds summary
# TYPE go_goroutines gauge
# TYPE go_info gauge
# TYPE go_memstats_alloc_bytes gauge
# TYPE go_memstats_alloc_bytes_total counter
# TYPE go_memstats_buck_hash_sys_bytes gauge
# TYPE go_memstats_frees_total counter
# TYPE go_memstats_gc_cpu_fraction gauge
# TYPE go_memstats_gc_sys_bytes gauge
# TYPE go_memstats_heap_alloc_bytes gauge
# TYPE go_memstats_heap_idle_bytes gauge
# TYPE go_memstats_heap_inuse_bytes gauge
# TYPE go_memstats_heap_objects gauge
# TYPE go_memstats_heap_released_bytes gauge
# TYPE go_memstats_heap_sys_bytes gauge
# TYPE go_memstats_last_gc_time_seconds gauge
# TYPE go_memstats_lookups_total counter
# TYPE go_memstats_mallocs_total counter
# TYPE go_memstats_mcache_inuse_bytes gauge
# TYPE go_memstats_mcache_sys_bytes gauge
# TYPE go_memstats_mspan_inuse_bytes gauge
# TYPE go_memstats_mspan_sys_bytes gauge
# TYPE go_memstats_next_gc_bytes gauge
# TYPE go_memstats_other_sys_bytes gauge
# TYPE go_memstats_stack_inuse_bytes gauge
# TYPE go_memstats_stack_sys_bytes gauge
# TYPE go_memstats_sys_bytes gauge
# TYPE go_threads gauge
# TYPE kubernetes_build_info gauge
# TYPE process_cpu_seconds_total counter
# TYPE process_max_fds gauge
# TYPE process_open_fds gauge
# TYPE process_resident_memory_bytes gauge
# TYPE process_start_time_seconds gauge
# TYPE process_virtual_memory_bytes gauge
# TYPE process_virtual_memory_max_bytes gauge
# TYPE rest_client_request_duration_seconds histogram
# TYPE rest_client_request_latency_seconds histogram
# TYPE rest_client_requests_total counter

There should actually be two other metrics registered with the OCCM:

cloudprovider_openstack_api_requests_duration_seconds
cloudprovider_openstack_api_requests_errors

They're not in the list above, and i can see that this function:

cloud-provider-openstack/pkg/cloudprovider/providers/openstack/openstack_metrics.go

Line 50 in a4ac35b

func RegisterMetrics() {

should register these metrics and the init() here:

cloud-provider-openstack/pkg/cloudprovider/providers/openstack/openstack.go

Line 287 in a4ac35b

RegisterMetrics()

should've triggered a call to the RegisterMetrics() function. Not sure what's going on there, i'll have to test some in our environments or if someone else have any idea? I can definitely try to bump the docs with information on how to monitor the OCCM.

Hopefully that clears things up a bit, and thanks for noticing!

fejta-bot · 2020-07-01T20:37:45Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-07-01T20:38:00Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

seanschneeweiss · 2020-08-12T08:37:46Z

/reopen
/remove-lifecycle rotten

We can see that the metrics code was added in this PR kubernetes/kubernetes#46008

Observing times or incrementing counters were implemented here:
https://github.com/NickrenREN/kubernetes/blob/18852c58c17a6dcd75912057c0e113b469bb2ced/pkg/cloudprovider/providers/openstack/openstack_volumes.go#L491-L498

The time measurement of the API call was implemented here:
https://github.com/NickrenREN/kubernetes/blob/18852c58c17a6dcd75912057c0e113b469bb2ced/pkg/cloudprovider/providers/openstack/openstack_volumes.go#L86-L100

Due to volumes related code being removed, these metrics are not present anymore. Metrics only get exposed, if a value exists.
In PR #1036 the code was removed.

Now it would be desirable to create new metrics similar to the implementation that already existed.

We propose changing openstack_metrics.go to something similar to (using legacyregistry as most other k8s components).

package openstack

import (
	"sync"

	"k8s.io/component-base/metrics"
	"k8s.io/component-base/metrics/legacyregistry"
)

const (
	openstackSubsystem         = "openstack"
	openstackOperationKey      = "cloudprovider_openstack_api_request_duration_seconds"
	openstackOperationErrorKey = "cloudprovider_openstack_api_request_errors"
)

var (
	openstackOperationsLatency = metrics.NewHistogramVec(
		&metrics.HistogramOpts{
			Subsystem: openstackSubsystem,
			Name:      openstackOperationKey,
			Help:      "Latency of openstack api call",
		},
		[]string{"request"},
	)

	openstackAPIRequestErrors = metrics.NewCounterVec(
		&metrics.CounterOpts{
			Subsystem: openstackSubsystem,
			Name:      openstackOperationErrorKey,
			Help:      "Cumulative number of openstack Api call errors",
		},
		[]string{"request"},
	)
)

var registerMetrics sync.Once

// RegisterMetrics registers EndpointSlice metrics.
func RegisterMetrics() {
	registerMetrics.Do(func() {
		legacyregistry.MustRegister(openstackOperationsLatency)
		legacyregistry.MustRegister(openstackAPIRequestErrors)
	})
}

A PR might follow for metrics related to openstack_loadbalancer.go

_{Sean Schneeweiss sean.schneeweiss@daimler.com, Daimler TSS GmbH, legal info/Impressum}

seanschneeweiss · 2020-08-12T08:46:51Z

/reopen

k8s-ci-robot · 2020-08-12T08:47:03Z

@seanschneeweiss: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lingxiankong · 2020-08-19T10:58:31Z

/reopen

k8s-ci-robot · 2020-08-19T10:58:43Z

@lingxiankong: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 31, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 30, 2020

k8s-ci-robot closed this as completed Jul 1, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 12, 2020

k8s-ci-robot reopened this Aug 19, 2020

seanschneeweiss mentioned this issue Aug 31, 2020

[occm] API requests & Loadbalancer reconciliation metrics #1178

Merged

k8s-ci-robot closed this as completed in #1178 Sep 24, 2020

This was referenced May 18, 2021

REQUEST: New membership for seanschneeweiss kubernetes/org#2719

Closed

REQUEST: New membership for seanschneeweiss kubernetes/org#2725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud-controller-manager metrics port and address is not accessible or documented #912

cloud-controller-manager metrics port and address is not accessible or documented #912

bgagnon commented Jan 24, 2020 •

edited

Loading

fejta-bot commented Apr 30, 2020

fejta-bot commented May 30, 2020

mikejoh commented Jun 1, 2020 •

edited

Loading

fejta-bot commented Jul 1, 2020

k8s-ci-robot commented Jul 1, 2020

seanschneeweiss commented Aug 12, 2020 •

edited

Loading

seanschneeweiss commented Aug 12, 2020

k8s-ci-robot commented Aug 12, 2020

lingxiankong commented Aug 19, 2020

k8s-ci-robot commented Aug 19, 2020

cloud-controller-manager metrics port and address is not accessible or documented #912

cloud-controller-manager metrics port and address is not accessible or documented #912

Comments

bgagnon commented Jan 24, 2020 • edited Loading

fejta-bot commented Apr 30, 2020

fejta-bot commented May 30, 2020

mikejoh commented Jun 1, 2020 • edited Loading

fejta-bot commented Jul 1, 2020

k8s-ci-robot commented Jul 1, 2020

seanschneeweiss commented Aug 12, 2020 • edited Loading

seanschneeweiss commented Aug 12, 2020

k8s-ci-robot commented Aug 12, 2020

lingxiankong commented Aug 19, 2020

k8s-ci-robot commented Aug 19, 2020

bgagnon commented Jan 24, 2020 •

edited

Loading

mikejoh commented Jun 1, 2020 •

edited

Loading

seanschneeweiss commented Aug 12, 2020 •

edited

Loading