Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

Closed
nilebox opened this issue Sep 22, 2020 · 1 comment · Fixed by #1826
Closed

Misleading error message for invalid "up" metric value in Prometheus receiver #1825

nilebox opened this issue Sep 22, 2020 · 1 comment · Fixed by #1826
Assignees
Labels
bug Something isn't working

Comments

@nilebox
Copy link
Member

nilebox commented Sep 22, 2020

Describe the bug
The up metric is considered special by Prometheus receiver and must always contain a constant value of 1.0.
This concept is documented in https://www.prometheus.io/docs/concepts/jobs_instances/:

up{job="<job-name>", instance="<instance-id>"}: 1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.
...
The up time series is useful for instance availability monitoring.

The issue is that when the value differs from 1.0, Prometheus receiver prints a generic error message "http client error":

b.logger.Warn("http client error", zap.Int64("timestamp", t), zap.Float64("value", v), zap.String("labels", fmt.Sprintf("%v", lm)))

e.g.

{
  "level":"warn",
  "ts":1600755340.4938133,
  "caller":"internal/metricsbuilder.go:115",
  "msg":"http client error",
  "component_kind":"receiver",
  "component_type":"prometheus",
  "component_name":"prometheus",
  "timestamp":1600755339676,
  "value":0,
  "labels":"map[instance:<ip>:<port> job:k8sapps kubernetes_pod_name:podname]"
}

i.e. doesn't mention anything about the up metric.

Steps to reproduce
Configure Prometheus to scrape a non-existent endpoint. This should produce the up metric with a value 0.

What did you expect to see?
A more informative message, e.g. "Scraping failed: the 'up' metric had a value 0".
We should also check specifically for 0 and in case it's different from 0/1 return a validation error instead.

What did you see instead?
"msg":"http client error"

What version did you use?
Version: v0.10.0 / master

@nilebox nilebox added the bug Something isn't working label Sep 22, 2020
@nilebox nilebox self-assigned this Sep 22, 2020
@nilebox
Copy link
Member Author

nilebox commented Sep 22, 2020

Update: after re-reading the page https://www.prometheus.io/docs/concepts/jobs_instances/, the value 0 is also valid, but means that the scraping has failed, so the error message is somewhat valid, though still needs to be improved from a generic "http client error".

1 if the instance is healthy, i.e. reachable, or 0 if the scrape failed.

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
* Remove the WithSDKOptions from the Jaeger exporter

* Lint

* Setup SDK in Jaeger example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant