Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prometheus client to 0.6.0, handle counter metric name change #3700

Closed
wants to merge 3 commits into from

Conversation

ahmed-mez
Copy link
Contributor

@ahmed-mez ahmed-mez commented Apr 30, 2019

follow up on #3443

What does this PR do?

Like summaries and histogram types have their _sum / _bucket / _count timeseries aggregated in a single metric using the trimmed name, versions 0.4.0+ of the python prometheus/openmetrics client trim the _total suffix out of counter metric names.

This is a breaking change for us, as integration and custom checks should change their configuration to refer to counter metrics with the new shorter metric name.

This PR introduces a compatibility layer to make this transition transparent. When looking up a metric name in one of the scraper_config's map/list, the following happens:

the lookup is attempted with the current (possibly trimmed) name
if the lookup fails and the metric is a counter, a second lookup is attempted, adding the _total suffix to the metric name.
If we go this route, the prometheus mixin should also be updated, as it uses the same client lib.

Other changes:

kubelet/tests/test_kubelet.py : update a test that is accessing internals of the Sample object

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Review checklist (to be filled by reviewers)

  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • Feature or bugfix must have tests
  • Git history must be clean
  • If PR adds a configuration option, it must be added to the configuration file.

@ahmed-mez ahmed-mez requested review from a team as code owners April 30, 2019 14:42
@codecov
Copy link

codecov bot commented Apr 30, 2019

Codecov Report

Merging #3700 into master will decrease coverage by 1.08%.
The diff coverage is 74.28%.

@@            Coverage Diff             @@
##           master    #3700      +/-   ##
==========================================
- Coverage    86.2%   85.12%   -1.09%     
==========================================
  Files         729       72     -657     
  Lines       37497     5977   -31520     
  Branches     4494      759    -3735     
==========================================
- Hits        32325     5088   -27237     
+ Misses       3947      723    -3224     
+ Partials     1225      166    -1059

@ahmed-mez ahmed-mez requested a review from ofek May 2, 2019 08:45
@@ -244,6 +245,26 @@ def assert_metric_has_tag_prefix(self, metric_name, tag_prefix, count=None, at_l
else:
assert len(candidates) >= at_least

def _match_metric_name_in_mapping(self, name, at_least):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we solve this instead by changing our fixtures?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofek our current fixtures are legit prometheus payloads, I don't think we should change them otherwise we could break submitted metric names.

This change in the library is actually breaking a few projects because it enforces counter to end up with _total see matrix-org/synapse#4001 or korfuri/django-prometheus#84

breaking change: prometheus/client_python@a4dd93b#diff-0adf47ea7f99c66d4866ccb4e557a865L158

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ofek for the review, I admit this solution is a bit hacky but as Pierre said, I'm not sure if changing our fixtures would be the best option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Instead, can we please change what we assert in the affected integrations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late to the party but I feel responsible for this 😛

I agree with @ofek we should keep stubs as dumb as possible, the more logic we add, the higher the risk of hiding bugs and introduce tech debt. The problem here is that assertion logic shouldn't be here in the first place but I needed something quickly to make tests work again during the epic Wheels Migration™️ and the Aggregator stub seemed like "if it fits, I sits".

So the quick-and-dirty solution if we don't want to touch fixtures would be moving special assertion logic elsewhere, better if we can keep it close to where it's actually needed like Ofek suggested. Even better (out of the scope of the PR) would be putting some work on the assertion logic, make it smarter (right now can be close to useless at times) and move it out of the stubs into a dedicated package with a clear API.

@ofek
Copy link
Contributor

ofek commented May 8, 2019

Any progress?

@ahmed-mez
Copy link
Contributor Author

A PR to optimize the openmetrics parser prometheus/client_python#402

@stale
Copy link

stale bot commented Jun 8, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

@stale stale bot added the stale label Jun 8, 2019
@ofek
Copy link
Contributor

ofek commented Jun 12, 2019

@stale stale bot removed the stale label Jun 12, 2019
@stale
Copy link

stale bot commented Jul 12, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

@hithwen
Copy link
Contributor

hithwen commented Oct 22, 2019

Could we vendor 0.3 in existing integrations and then upgrade only for new integrations? That way we do not break existing integrations but we dont need to implement a hack. We would still break custom checks but if there is a plan to have an agent 8 that could be a chance to do this

@stale stale bot removed the stale label Oct 22, 2019
@stale
Copy link

stale bot commented Nov 21, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. Note that the issue will not be automatically closed, but this notification will remind us to investigate why there's been inactivity. Thank you for participating in the Datadog open source community.

@ahmed-mez
Copy link
Contributor Author

ahmed-mez commented Apr 1, 2020

closing this in favour of #6200

@ahmed-mez ahmed-mez closed this Apr 1, 2020
@ahmed-mez ahmed-mez deleted the ahmed-mez/openmetrics-060 branch April 1, 2020 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants