Misleading error message for invalid "up" metric value in Prometheus receiver #1825

nilebox · 2020-09-22T09:27:11Z

Describe the bug
The up metric is considered special by Prometheus receiver and must always contain a constant value of 1.0.
This concept is documented in https://www.prometheus.io/docs/concepts/jobs_instances/:

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.
...
The up time series is useful for instance availability monitoring.

The issue is that when the value differs from 1.0, Prometheus receiver prints a generic error message "http client error":

opentelemetry-collector/receiver/prometheusreceiver/internal/metricsbuilder.go

Line 115 in cba929d

    
           b.logger.Warn("http client error", zap.Int64("timestamp", t), zap.Float64("value", v), zap.String("labels", fmt.Sprintf("%v", lm)))

e.g.

{
  "level":"warn",
  "ts":1600755340.4938133,
  "caller":"internal/metricsbuilder.go:115",
  "msg":"http client error",
  "component_kind":"receiver",
  "component_type":"prometheus",
  "component_name":"prometheus",
  "timestamp":1600755339676,
  "value":0,
  "labels":"map[instance:<ip>:<port> job:k8sapps kubernetes_pod_name:podname]"
}

i.e. doesn't mention anything about the up metric.

Steps to reproduce
Configure Prometheus to scrape a non-existent endpoint. This should produce the up metric with a value 0.

What did you expect to see?
A more informative message, e.g. "Scraping failed: the 'up' metric had a value 0".
We should also check specifically for 0 and in case it's different from 0/1 return a validation error instead.

What did you see instead?
"msg":"http client error"

What version did you use?
Version: v0.10.0 / master

The text was updated successfully, but these errors were encountered:

nilebox · 2020-09-22T09:33:06Z

Update: after re-reading the page https://www.prometheus.io/docs/concepts/jobs_instances/, the value 0 is also valid, but means that the scraping has failed, so the error message is somewhat valid, though still needs to be improved from a generic "http client error".

1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

* Remove the WithSDKOptions from the Jaeger exporter * Lint * Setup SDK in Jaeger example

nilebox added the bug Something isn't working label Sep 22, 2020

nilebox self-assigned this Sep 22, 2020

nilebox mentioned this issue Sep 22, 2020

Prometheus Receiver: Print a more informative message about 'up' metric value #1826

Merged

bogdandrutu closed this as completed in #1826 Sep 23, 2020

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021

Remove the WithSDKOptions from the Jaeger exporter (open-telemetry#1825)

543c814

* Remove the WithSDKOptions from the Jaeger exporter * Lint * Setup SDK in Jaeger example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

nilebox commented Sep 22, 2020 •

edited

Loading

nilebox commented Sep 22, 2020

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

Comments

nilebox commented Sep 22, 2020 • edited Loading

nilebox commented Sep 22, 2020

nilebox commented Sep 22, 2020 •

edited

Loading