Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor serving of Rack response bodies #1037

Merged
merged 15 commits into from
Feb 22, 2024

Conversation

julik
Copy link
Contributor

@julik julik commented Feb 4, 2024

Some work might be getting done within a Rack response body. For example, when ActionController::Streaming is used, or when a Rack app elects to stream a response.

The Rack SPEC doc actually defines the behavior in sufficient detail to wrap this into the same Appsignal transaction.

The Rails instrumentation may also inherit from the more generic one as most methods and constants are shared then

To do

  • Add a changeset before merge

@julik
Copy link
Contributor Author

julik commented Feb 4, 2024

@tombruijn @jeffkreeftmeijer this is a draft - wanted to ask you what you think. I have encountered in a few cases that this would be very useful, as currently streaming (or non-presized) bodies elide the instrumentation and (more importantly) do not send errors. One can install a StreamWrapper but it is not going to be compatible with Rack 3 - and besides, why not use the wrapper everywhere anyway?

@julik
Copy link
Contributor Author

julik commented Feb 5, 2024

This will invariably move things closer for #329 too

@tombruijn
Copy link
Member

Hi @julik, this change is looking good so far 👍
It would be helpful if you could split the refactor to the generic middleware and the changes to support streaming bodies into separate commits.
A small (rails?) test app that highlights the problem of it not reporting errors would also be helpful for us to review it. We have a repo with test apps. You can modify one of the example apps, but a gist or separate repo would also be fine.
We'll handle the deprecation of constants.

@julik
Copy link
Contributor Author

julik commented Feb 5, 2024

Ok, let me get the PR to the place where all tests pass and all the bodies are tested, then we can take a look how to split it into two commits in a more logical way.

@julik
Copy link
Contributor Author

julik commented Feb 5, 2024

Limited to just adding the body, LMK if you want the class hierarchy change in a different PR

@julik julik marked this pull request as ready for review February 5, 2024 23:19
@julik
Copy link
Contributor Author

julik commented Feb 5, 2024

I am curious why transaction.set_http_or_background_queue_start gets called after calling into the app though, but I don't understand the mechanics well enough.

@tombruijn
Copy link
Member

I am curious why transaction.set_http_or_background_queue_start gets called after calling into the app though, but I don't understand the mechanics well enough.

It doesn't really matter when it's called. The start time we read comes from a header, which doesn't change during the request. We mostly call it at the end when all the other metadata is also set.

@julik
Copy link
Contributor Author

julik commented Feb 6, 2024

Okido. This should be good to go now. Do you want to add an instrumentation block inside the serving of the body too?

@tombruijn
Copy link
Member

tombruijn commented Feb 6, 2024

Thanks for the updates!

Do you have an app or some part of a controller that we can test your changes with? That would help ensure we're reviewing the things your fixing/changing.

So far, testing it on our ruby/rails7-sidekiq app, I'm getting no data from Rails requests.

Getting a lot of these logs from the transaction initializer, so it's a bit confused at the moment about which transaction is active. This happens when a transaction is being created when another is active, or viewed in another way, when the previous request's transaction hasn't been completed.

[2024-02-06T13:29:34 (process) #133][DEBUG] Trying to start new transaction with id '01aef6e8-2576-4dce-9d79-777263c44de9', but a transaction with id '78b895a4-778a-41c0-855d-1866b8bf1a5b' is already running. Using transaction '78b895a4-778a-41c0-855d-1866b8bf1a5b'.

@transaction&.set_error(error)
raise error
ensure
@transaction&.complete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to call Appsignal::Transaction.complete_current! so it unsets the current transaction stored in the Thread variable.

ensure
clear_current_transaction!
end
# Remove current transaction from current Thread.
# @api private
def clear_current_transaction!
Thread.current[:appsignal_transaction] = nil
end
end

@tombruijn tombruijn self-assigned this Feb 6, 2024
@tombruijn tombruijn removed their assignment Feb 6, 2024
@tombruijn tombruijn removed their assignment Feb 6, 2024
@julik
Copy link
Contributor Author

julik commented Feb 6, 2024

So far, testing it on our ruby/rails7-sidekiq app, I'm getting no data from Rails requests.

I will take a look what it could be. I checked out the repo and I was wondering what API key I should be using if I want to run one of the apps there, and how do the actual requests to the apps get done? does it use Net::HTTP or I am supposed to visit the pages by myself?

Getting a lot of these logs from the transaction initializer, so it's a bit confused at the moment about which transaction is active. This happens when a transaction is being created when another is active, or viewed in another way, when the previous request's transaction hasn't been completed.

That's probably the complete bit you mentioned earlier. But why would Transaction#complete here not work?

@julik
Copy link
Contributor Author

julik commented Feb 6, 2024

Do you have an app or some part of a controller that we can test your changes with? That would help ensure we're reviewing the things your fixing/changing.

I might actually try it with Cheddar briefly 😆 but let me see whether I can get the complete_current to work

@unflxw
Copy link
Contributor

unflxw commented Feb 6, 2024

Thanks for looking into the test setups @julik!

what API key I should be using if I want to run one of the apps there

Your organization-level API key should be fine.

how do the actual requests to the apps get done?

You go to http://localhost:4001 and click around. These are primarily for manual testing.

@julik
Copy link
Contributor Author

julik commented Feb 6, 2024

These are primarily for manual testing

Aight. I'll click around then.

@tombruijn
Copy link
Member

Do you have an app or some part of a controller that we can test your changes with? That would help ensure we're reviewing the things your fixing/changing.

I might actually try it with Cheddar briefly 😆 but let me see whether I can get the complete_current to work

When I changed the line 51 I commented on in this comment with Appsignal::Transaction.complete_current! it started reporting data.

I hope you see the same behavior in your app. Check the appsignal.log file's debug messages. You will need to set the log_level config option to debug to see those logs.

I'd also to test a streaming request hitting this new code.

@julik
Copy link
Contributor Author

julik commented Feb 7, 2024

@ioquatix apologies for summoning, but could you help me check this? We want the BodyWrapper to be compatible across Rack 1, 2 and 3 (and beyond, if at all possible). I followed the tribulations in the rack and rails repo with interest and tried to make the solution ultra-safe here. Is there anything that cold crash here? esp. the semantics of when to call / not to call body.close

@ioquatix
Copy link

ioquatix commented Feb 7, 2024

This is a tricky thing to implement.

https://github.com/rack/rack/blob/main/lib/rack/body_proxy.rb and similar for older versions of Rack would be the gold standard for "proxy a body".

The expectation is you'd define #close and call super if possible, and implement your own logic there too.

Long running responses could run for minutes or hours or longer. Is that okay?

If you want validation, you should test on Rack 1, 2 and 3, using Rack::Lint.

If you want a conformance test suite that runs on lots of different servers, you could check out rack-conform: https://github.com/socketry/rack-conform - if you plug in your middleware, it might give you additional test cases / confidence your implementation is correct across a wide range of servers and scenarios.

@julik
Copy link
Contributor Author

julik commented Feb 7, 2024

Thanks for your input @ioquatix it does seem that with the fact that we need to rescue to capture errors correctly the implementation that I came up with is not as overkill as it seemed.

@julik
Copy link
Contributor Author

julik commented Feb 13, 2024

@tombruijn this is ready to go as far as I am concerned. Does lintje have an autofix option? I've added test code to appsignal/test-setups#186 but I haven't tested them all, I presume it's all working fine to be honest. If you could give the test setups a second look I would be grateful.

@julik
Copy link
Contributor Author

julik commented Feb 13, 2024

Also: one could in theory wrap the serving out of the body (what's inside each) into an instrument, but that's trivial to do at this stage - leaving that decision to you.

julik added a commit to cheddar-me/rails-twirp that referenced this pull request Feb 13, 2024
The Rack response body is _not_ an Enumerable, it just responds to #each. The method was assuming otherwise, which led to breakage when trying to use appsignal/appsignal-ruby#1037
julik added a commit to cheddar-me/rails-twirp that referenced this pull request Feb 13, 2024
The Rack response body is _not_ an Enumerable, it just responds to #each. The method was assuming otherwise, which led to breakage when trying to use appsignal/appsignal-ruby#1037
@julik julik mentioned this pull request Feb 20, 2024
36 tasks
@julik
Copy link
Contributor Author

julik commented Feb 20, 2024

@julik can you fix the last Rubocop issues?

Done!

@backlog-helper
Copy link


This is a message from the daily scheduled checks.

New issue guide | Backlog management | Rules | Feedback

which - it would seem - requires another conditional, but it doesn't.
tombruijn added a commit that referenced this pull request Feb 22, 2024
Add changeset so the change can be released and is shown in the
CHANGELOG.md file upon release.
Add changeset so the change can be released and is shown in the
CHANGELOG.md file upon release.
Copy link
Contributor

@unflxw unflxw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! The test setup works fine. I have some minor nitpicks about how the information is presented.

These are very uninformed suggestions, in that I'm not a Rack expert by any means, nor am I familiar with the streaming use case and which information would be useful in it.

That is, if some of the information that I suggest removing is actually useful, feel free to ignore the suggestion.


  • Instead of response_body_each, I think the category name might work better as a verb, like "process_action" or "query"? I'm thinking stream_response?
  • I don't know whether there's much value on having different category names for each internal implementation of the body (response_body_each, response_body_call, ...) I feel that this is more likely to confuse folks than to inform them?
  • As the cherry on top, it would be nice if the category name wasn't too long -- very long category names eat into the event timeline in the UI.

Suggestion: use a single category for all the response body instrumentation blocks (stream_response.rack?) and, if the information about the specific implementation that's being used is relevant, add that as part of the title, which shows up on the UI on hover:

  Appsignal.instrument(
    "stream_response.rack",
    "Stream response body (#to_path)"
  ) { @body.to_path }

  • Regarding the body close event: is this a relevant event to instrument? I believe that this event would always happen right after the body streaming event. Does this event add any information to the event timeline that isn't already added by the finishing time of the body streaming event?

Suggestion: do not instrument the call to .close on the body.

@julik
Copy link
Contributor Author

julik commented Feb 22, 2024

Does this event add any information to the event timeline that isn't already added by the finishing time of the body streaming event?

I'd vouch it does, because there can be operations there which take a long time. For example - deleting temp data, checking in connections into a connection pool, running some kind of after_commit activity etc. When Rack middleware gets developed and one needs to reach for the close method that method often does some kind of "flushing" or "cleanup" which might be expensive. The absence of a span for it in the waterfall could also indicate that close didn't get called, which would be a bug in the webserver.

use a single category for all the response body instrumentation blocks

I'll leave it to @tombruijn be the judge on that, no objections from me. I separated them so that it would be apparent which response body method got called - this is related to the Rails bugs linked inline from the comments, but no strong opinion.

@ioquatix
Copy link

Instrumenting response#close is probably valuable if you capture it as a span.

@tombruijn
Copy link
Member

There's a lot of technical stuff in event names now already from libraries like Rails that's not always obvious, but we can make the experience a little easier now that we have the option. The event name should be something process_response_body.rack then. We we can add the details about what type in the event title, like "Process Rack response body (#to_path)", unless it's crucial information you need at a glance looking at the timeline.

And let's keep the close event.

@julik can you update the event categories and add the event titles if you agree?

@julik
Copy link
Contributor Author

julik commented Feb 22, 2024

@tombruijn sure - can you list out the "blessed" event names for me?

@tombruijn
Copy link
Member

tombruijn commented Feb 22, 2024

Event names:

  • Close event:
    • Category: response_body_close.rack
    • Title: None
  • EnumerableBodyWrapper event:
    • Category: process_response_body.rack
    • Title: Process Rack response body (#each)
  • CallableBodyWrapper event:
    • Category: process_response_body.rack
    • Title: Stream response body (#call)
  • ArrayableBodyWrapper event:
    • Category: process_response_body.rack
    • Title: Stream response body (#to_ary)
  • PathableBodyWrapper event:
    • Category: response_body_to_path.rack
    • Title: None

I asked Julik a question on Slack about the to_path event naming, otherwise I think should be good.
Confirmed. Let's go with the naming above.

@tombruijn
Copy link
Member

Ah I mistyped something in my previous comment. I'll add a commit and merge in a bit.

I mistyped the titles for the events in my comment on the GitHub issue.
This is what I meant to type.
@tombruijn tombruijn merged commit 89e72fe into appsignal:main Feb 22, 2024
26 of 28 checks passed
@tombruijn
Copy link
Member

Thanks for the PR @julik 🎉 This will be included in the next release, which will be a minor version bump. I think we can publish it tomorrow.

@julik julik deleted the monitor-rack-body-too branch February 22, 2024 22:55
@tombruijn
Copy link
Member

This has been released in Ruby gem 3.6.0 🚀

tombruijn added a commit that referenced this pull request Jul 2, 2024
Instrument what happens when a Rack response body is read and closed. We
instrument these events by wrapping the response body in the appropriate
response body BodyWrapper subclass, depending on the response object
type.

This change was previously sent in as PR #1037 and reverted in #1052. We
saw issues with multiple requests reported in the same transaction. This
problem occurred when there was an error in the middleware stack, and
the BodyWrapper never closed the response body.

I've removed any transaction complete logic from the BodyWrapper in the
original PR. We now have a Rack EventHandler that ensures the
transaction is always closed per request, even when such an error
scenario occurs again.

The only scenario in which we don't support this response body
instrumentation is when no EventHandler is present in the middleware
stack. This level of support is acceptable to me. We want people to use
the EventHandler. Most of our automatic instrumentations are updated to
add it to the middleware stack.

From the original commit: 7da96e7

> Some work might be getting done within a Rack response body. For
> example, when ActionController::Streaming is used, or when a Rack app
> elects to stream a response.
>
> The Rack SPEC doc defines the behavior in sufficient detail to wrap
> this into the current AppSignal transaction.
>
> Sadly, there is some work involved in supporting all the right
> methods, so just "one-size-fits-all" wrapper will not quite work.

Co-authored-by: Julik Tarkhanov <me@julik.nl>
tombruijn added a commit that referenced this pull request Jul 2, 2024
Instrument what happens when a Rack response body is read and closed. We
instrument these events by wrapping the response body in the appropriate
response body BodyWrapper subclass, depending on the response object
type.

This change was previously sent in as PR #1037 and reverted in #1052. We
saw issues with multiple requests reported in the same transaction. This
problem occurred when there was an error in the middleware stack, and
the BodyWrapper never closed the response body.

I've removed any transaction complete logic from the BodyWrapper in the
original PR. We now have a Rack EventHandler that ensures the
transaction is always closed per request, even when such an error
scenario occurs again.

The only scenario in which we don't support this response body
instrumentation is when no EventHandler is present in the middleware
stack. This level of support is acceptable to me. We want people to use
the EventHandler. Most of our automatic instrumentations are updated to
add it to the middleware stack.

From the original commit: 7da96e7

> Some work might be getting done within a Rack response body. For
> example, when ActionController::Streaming is used, or when a Rack app
> elects to stream a response.
>
> The Rack SPEC doc defines the behavior in sufficient detail to wrap
> this into the current AppSignal transaction.
>
> Sadly, there is some work involved in supporting all the right
> methods, so just "one-size-fits-all" wrapper will not quite work.

Co-authored-by: Julik Tarkhanov <me@julik.nl>
tombruijn added a commit that referenced this pull request Jul 2, 2024
Instrument what happens when a Rack response body is read and closed. We
instrument these events by wrapping the response body in the appropriate
response body BodyWrapper subclass, depending on the response object
type.

This change was previously sent in as PR #1037 and reverted in #1052. We
saw issues with multiple requests reported in the same transaction. This
problem occurred when there was an error in the middleware stack, and
the BodyWrapper never closed the response body.

I've removed any transaction complete logic from the BodyWrapper in the
original PR. We now have a Rack EventHandler that ensures the
transaction is always closed per request, even when such an error
scenario occurs again.

The only scenario in which we don't support this response body
instrumentation is when no EventHandler is present in the middleware
stack. This level of support is acceptable to me. We want people to use
the EventHandler. Most of our automatic instrumentations are updated to
add it to the middleware stack.

From the original commit: 7da96e7

> Some work might be getting done within a Rack response body. For
> example, when ActionController::Streaming is used, or when a Rack app
> elects to stream a response.
>
> The Rack SPEC doc defines the behavior in sufficient detail to wrap
> this into the current AppSignal transaction.
>
> Sadly, there is some work involved in supporting all the right
> methods, so just "one-size-fits-all" wrapper will not quite work.

Part of #1099

Co-authored-by: Julik Tarkhanov <me@julik.nl>
tombruijn added a commit that referenced this pull request Jul 8, 2024
Instrument what happens when a Rack response body is read and closed. We
instrument these events by wrapping the response body in the appropriate
response body BodyWrapper subclass, depending on the response object
type.

This change was previously sent in as PR #1037 and reverted in #1052. We
saw issues with multiple requests reported in the same transaction. This
problem occurred when there was an error in the middleware stack, and
the BodyWrapper never closed the response body.

I've removed any transaction complete logic from the BodyWrapper in the
original PR. We now have a Rack EventHandler that ensures the
transaction is always closed per request, even when such an error
scenario occurs again.

The only scenario in which we don't support this response body
instrumentation is when no EventHandler is present in the middleware
stack. This level of support is acceptable to me. We want people to use
the EventHandler. Most of our automatic instrumentations are updated to
add it to the middleware stack.

From the original commit: 7da96e7

> Some work might be getting done within a Rack response body. For
> example, when ActionController::Streaming is used, or when a Rack app
> elects to stream a response.
>
> The Rack SPEC doc defines the behavior in sufficient detail to wrap
> this into the current AppSignal transaction.
>
> Sadly, there is some work involved in supporting all the right
> methods, so just "one-size-fits-all" wrapper will not quite work.

Part of #1099

Co-authored-by: Julik Tarkhanov <me@julik.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants