Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New time metric aggregation layer and refactor of prometheus metrics #267

Merged

Conversation

adam-cattermole
Copy link
Member

@adam-cattermole adam-cattermole commented Mar 12, 2024

Adds a new metrics layer to the tracing configuration which times accesses to different spans.

gather accepts:

  • group: the parent span to combine for
  • consumer: a function to perform some kind of operation on the Timings collected for this group
  • records: a list of different sub-spans of group that should be summed together

It's currently set up to sum all datastore span accesses for a should_rate_limit call

Refactor

  • Moved prometheus_metrics out of the library and now defined at the server level

@adam-cattermole adam-cattermole self-assigned this Mar 12, 2024
@adam-cattermole adam-cattermole force-pushed the tracing-metrics-layer branch 3 times, most recently from 4c8b4d9 to 975b701 Compare March 19, 2024 14:55
@adam-cattermole adam-cattermole marked this pull request as ready for review March 26, 2024 16:57
@eguzki
Copy link
Contributor

eguzki commented Mar 27, 2024

I am trying to understand what is this doing.
I tried this branch and compared withmain. Same data at /metrics and same info from opentelemetry spans I could see in jaeger UI.

@adam-cattermole
Copy link
Member Author

adam-cattermole commented Apr 2, 2024

The primary difference is the way that those metrics are collected at /metrics @eguzki. Before the calculation was all being done inline with time instants wrapping the sections of code that access the datastore in the library. It's now been abstracted outside of the library, with all of the latency metrics being calculated through a new metrics layer instrumenting the spans. The result is two primary changes:

  1. The library no longer cares or implements metrics and moves the owner of the metrics to limitador-server
  2. A new tracing metrics layer that times and sums spans based on configuration, and aggregates based on the provided consumer function
    • For the time being this function is configured to sum all datastore accesses for each should_rate_limit call, and the consumer function is configured to register the total with the prometheus counter_latency metric

Copy link
Member

@alexsnaps alexsnaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor stuff... but otherwise LGTM

limitador-server/src/metrics.rs Outdated Show resolved Hide resolved
limitador-server/src/metrics.rs Outdated Show resolved Hide resolved
limitador-server/src/metrics.rs Outdated Show resolved Hide resolved
limitador-server/src/metrics.rs Outdated Show resolved Hide resolved
@adam-cattermole adam-cattermole merged commit 22f65b0 into Kuadrant:main Apr 4, 2024
9 checks passed
@adam-cattermole adam-cattermole deleted the tracing-metrics-layer branch April 4, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants