Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore tempo_request_duration_seconds metrics for querier_api_* requests #3403

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

kvrhdn
Copy link
Member

@kvrhdn kvrhdn commented Feb 16, 2024

What this PR does:

Recent changes (I suspect #3300) broke middleware on our querier requests. We have HTTP middleware that provides tracing and metrics. Requests sent to the querier are routed slightly differently and due to this change we didn't get metrics anymore for the HTTP routes.

Issue
So tempo_request_duration_seconds does not show up for routes like querier_api_search_tags and querier_api_traces_traceid. GRPC routes were still instrumented correctly (/tempopb.Querier/FindTraceByID and /tempopb.Querier/SearchTags).

How it works
Normal middleware flow:

http.Handler.Serve --> middleware (tracing, metrics) --> mux.Router (calls the correct http.Handler for each path)

When we set stream_over_http_enabled: false this will be

http.Handler.Serve --> dskit.Server (with instrumentation middleware) --> Tempo's mux.Router

When we set stream_over_http_enabled: true we switch up the order a bit:

http.Handler.Server --> GRPC interceptor --> instrumentation middleware --> Tempo's Router
                        (1)                                                 (2)

The bug: requests from query-frontend to querier are handled differently, they are pulled by the frontend worker process.
The frontend worker will pull a request from the frontend, unpack the GRPC requests and then push the HTTP request to the regular HTTP handler. This handler was the router (marked (2)), since this handler is after instrumentation we didn't get any metrics for it. By instead passing the start of the middleware chain ((1)) we fix this.

Adding HTTPRouter and HTTPHandler to TempoServer should make this more explicit.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

if err != nil {
return nil, fmt.Errorf("failed to create server: %w", err)
}
s.handler = s.externalServer.HTTPServer.Handler
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joe-elliott we were really close while pairing. We used s.externalServer.HTTP which is the router, HTTPServer.Handler is the handler which contains all middleware as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so close!

@@ -21,7 +21,8 @@ import (
)

type TempoServer interface {
HTTP() *mux.Router
HTTPRouter() *mux.Router
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this rename for clarity

@kvrhdn kvrhdn enabled auto-merge (squash) February 16, 2024 18:21
@kvrhdn kvrhdn merged commit c041e08 into grafana:main Feb 16, 2024
14 checks passed
kvrhdn added a commit to kvrhdn/tempo that referenced this pull request Feb 26, 2024
…ests (grafana#3403)

* Restore tempo_request_duration_seconds metrics for querier_api_* requests

* Update CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants