Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle inconsistencies with *_over_time() in VictoriaMetrics vs. Prometheus #596

Open
nemobis opened this issue May 29, 2023 · 4 comments
Open

Comments

@nemobis
Copy link

nemobis commented May 29, 2023

By design decision, VictoriaMetrics 1.44+ (and before 1.39.4) doesn't remove the __name__ label from the metrics resulting from an avg_over_time() and other *_over_time() functions: VictoriaMetrics/VictoriaMetrics#674 (comment)

As a result, if you have two server groups, one with Prometheus and the other with VictoriaMetrics hosts, which provide a non-empty answer to the same query with the same labels, Promxy will return each response in duplicate, with and without the __name__ label.

As a feature request, it would be nice for Promxy to be able to deduplicate the response in such a case.

@jacksontj
Copy link
Owner

So is the ask that promxy will reomve the __name__ label for these series so they can merge with prometheus series?

I think the base-case would work (e.g. query for something like avg_over_time(X)) but if it is some more complex query (Subqueries, Compound expressions, etc.) then this may prove quite difficult to achieve (since basically all we could do from promxy is "guess" when the name should be removed, and remove it).

I would actually highly recommend that VictoriaMetrics has a config option for this; as they are consciously choosing to break compatibility -- which is fine until you want interoperability.

That being said; it seems that the logic behind not removing the names may be incorrect (VictoriaMetrics/VictoriaMetrics#674 (comment))

@nemobis
Copy link
Author

nemobis commented Jun 3, 2023

So is the ask that promxy will reomve the __name__ label for these series so they can merge with prometheus series?

Yes.

I think the base-case would work (e.g. query for something like avg_over_time(X)) but if it is some more complex query (Subqueries, Compound expressions, etc.) then this may prove quite difficult to achieve (since basically all we could do from promxy is "guess" when the name should be removed, and remove it).

Indeed. I've not recently looked at the Promxy code, but I was thinking of a deduplication similar to what happens within server groups. If a query gets responses which are identical save for the __name__ (same values, same labels except __name__), discard the response which has a __name__.

I would actually highly recommend that VictoriaMetrics has a config option for this; as they are consciously choosing to break compatibility -- which is fine until you want interoperability.

I agree, but it sounds like they discarded this option. So I thought I'd first check how difficult this is to solve downstream. I could try and ask again but first I need to check whether this was discussed more recently.

@nemobis
Copy link
Author

nemobis commented Oct 14, 2023

I was looking for a reference for the Prometheus behaviour. I found one (probably not the best possible) in prometheus/prometheus#2441 (comment):

When you apply a function to a metric, it's no longer that metric. Thus the name is removed.

@jacksontj
Copy link
Owner

@nemobis thanks for the bump on this!

As I'm thinking on this a bit more it sounds like the ask here is basically to workaround VictoriaMetrics returning an "incorrect" response (as it shouldn't include the name and didn't for a while).

I was thinking of a deduplication similar to what happens within server groups. If a query gets responses which are identical save for the name (same values, same labels except name), discard the response which has a name.

This approach seems like it'll be complicated to say the least. As we'd need to have context about the data (e.g. is this part of some types of aggregations where name doesn't matter) when doing a merge.

So the best solution here seems to be that VictoriaMetrics fixes this on their side. This should be able to be worked-around client side by doing an additional sum and the without keyword. So instead of sum_over_time(foo[1m]) it would be sum(sum_over_time(foo[1m])) without (__name__). If this approach always works promxy could optionally do that transparently (thereby making the "fix" in the NodeReplacer instead of the data merging) -- otherwise I don't think there is a great solution in promxy for this particular case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants