Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: blocking detection #2709

Merged
merged 49 commits into from
Mar 8, 2024
Merged

feat: blocking detection #2709

merged 49 commits into from
Mar 8, 2024

Conversation

bruno-garcia
Copy link
Member

@bruno-garcia bruno-garcia commented Oct 9, 2023

Just playing around, but after a thread on twitter (https://twitter.com/brungarc/status/1711484812702171309). Based in the Ben Adams BlockingDetector.

Suppressing Blocking Detection

If users are intentionally making blocking calls in their code for some reason, the blocking detection can be suppressed (temporarily) via:

using (new SuppressBlockingDetection())
{
    // Some blocking code
}

Volume Concerns

If blocking calls exist on a hot path, we may end up sending lots of events.

We did create a solution to this in #3174 but decided to roll this back in this PR due to concerns about using metrics.

Top Frame Decision

Bruno had experimented, in the initial code, with the number of frames to skip when capturing a stack trace. The thinking was that it might be nice to see one or two calls into the Sentry blocking detection code. This turns out to be problematic when none of the stack frames are InApp, since Sentry doesn't know which frame to highlight. So I've reverted to Ben Adams' original code that leaves the culprit blocking call at the top of the stack. This makes it easy to find regardless of whether there are InApp frames present in the stack trace or not.

Grouping

There were concerns about grouping and fingerprinting... By default Sentry uses the stack trace as a fingerprint and we can get different stack traces, depending on how the async state machine schedules the task. However this is also related to the top frame decision and given that we're skipping any frames related to the TPL now, the grouping problem has gone away.

Resources

Aparently you have to be called Stephen to understand Synchronization 😜

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against ffef982

@bruno-garcia
Copy link
Member Author

since I haven't had a change to look into this for a long time, some notes I have:

Stack trace in SentryEvent.Thread and also in Sentry.Exception? Seems like it's done so the UI renders in some way
Some thread info only shows if there are more than 1 thread in the array (Create 1 manually to see it in the UI)

If all InApp=false groups by the whole thing and creates new issues for each event. Group by URL in that case? (also risky because of parameterized URL)

@jamescrosswell jamescrosswell self-assigned this Feb 6, 2024
@jamescrosswell jamescrosswell linked an issue Feb 7, 2024 that may be closed by this pull request
@jamescrosswell
Copy link
Collaborator

@bruno-garcia are we sure we want this as the call stack?
image

I know you intentially changed skipframes from 3:6 to 0:3 but it seems weird to have some Sentry code at the top of the stack (rather than highlighting the culprit code - i.e. the In App frame).

@bruno-garcia
Copy link
Member Author

Braindump during chat with @jamescrosswell

  • Concern with volume: If there's a blocking call on a hot path (like on GET /product) this will fire on each request and burn through quota.
    • We could keep state on the client and fire it once.
    • Folks can use error sample rate, or sample on beforeSend specifically for this error.
  • Use metrics and code locations? But here we really need the full stack trace.
  • What should be the top frame? In other words, how many frames to skip? (original code DetectionSource.SynchronizationContext ? 3 : 6).
    • Sentry focuses (expands) on the first InApp frame so it's OK to include some framework code that's called from it. But not clear how many frames/where to go
  • The SDK itself triggers at least 2 events on the first request. Only happen once but if we can, we need to fix these before going ahead.
    • If this is intentional, we need to opt-out of reporting this one.
  • We need a mechanism to drop them outside giving folks code to write (e.g; beforeSend snippets), see point above. We need an API to drop certain errors.
  • SentryThreads vs SentryException. I found it odd looking at events that we had stack traces on both (duplicated). And the UI rendered different depending on which one was used. I tried to capture this in this note (feat: blocking detection #2709 (comment))
    • Not really related to blocking detector but came up while doing this.
    • To better understand this, uncomment the code in the PR (SentryThreads = new []{new SentryThread) and see how the UI changes. Also inspect JSON and note that we have 1 dupe stack trace, matching by ThreadId IIRC.
  • Grouping problems. We'd need to make sure each blocking detected groups properly. If all frames are inApp=false, the async stack trace machinery can change and result in different groups. Not sure how to solve this, might need to Fingerprint as BlockingDetectorNotInAPp or group by URL in that case? (also risky because of parameterized URL, transaction might work if not raw url based).
  • Tests - From the lib might help but mainly how it groups, etc. How does it work within Sentry?
  • Documentation: Not all things are detected. As Ben Adams listed in details on the original code's repo

@bruno-garcia bruno-garcia marked this pull request as ready for review February 27, 2024 18:41
@jamescrosswell
Copy link
Collaborator

Won't this potentially hide useful information about where the problem is? We could mark these on the backend as grouping=false although I don't have a lot of details on how to do that. @adinauer I believe did some stuff for Java and might know (I believe getsentry/sentry#45185)

These frames all come after the blocking call so I don't think they're relevant to the event.

@adinauer
Copy link
Member

@bruno-garcia @jamescrosswell glancing at the code here this seems different from what I changed for Java SDK. We removed some auto generated IDs from class names in the stacktrace so they don't mess up grouping in the issue you linked.

@bruno-garcia
Copy link
Member Author

@bruno-garcia @jamescrosswell glancing at the code here this seems different from what I changed for Java SDK. We removed some auto generated IDs from class names in the stacktrace so they don't mess up grouping in the issue you linked.

yes, it's a totally different use case. But the goal here is to send frames that we'd like to show, but not group by. Which can be achieved by changing the server the similarly to how you did it.

@bruno-garcia bruno-garcia requested a review from vaind March 5, 2024 22:24
@bruno-garcia
Copy link
Member Author

@vaind added u as reviewer since @jamescrosswell wrote the code since the initial push I made so I guess both of us are like "not sure I should review or merge this" :D

src/Sentry/Ben.BlockingDetector/README.md Outdated Show resolved Hide resolved
@@ -155,7 +155,7 @@ private SentryException BuildSentryException(Exception exception, int id, int? p
sentryEx.Mechanism = mechanism;
}

sentryEx.Stacktrace = SentryStackTraceFactoryAccessor().Create(exception);
sentryEx.Stacktrace ??= SentryStackTraceFactoryAccessor().Create(exception);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the implication here? Previously, the stacktrace would get overwritten. Why is this change necessary? Where would the stacktrace come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stack trace is already set in the event. We get it from the blocking detection logic so we won't want it overwriten

Copy link
Contributor

@bitsandfoxes bitsandfoxes Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sentryEx.Stacktrace ??= SentryStackTraceFactoryAccessor().Create(exception);
sentryEx.Stacktrace = SentryStackTraceFactoryAccessor().Create(exception);

The sentryEx gets newly instantiated in this method and the StackTrace has to be set explicitly. I think this change does nothing.

We get it from the blocking detection logic so we won't want it overwriten

That all happens in the event processor.

Comment on lines +29 to +30
// Skip frames relating to the async state machine
|| frameInfo?.StartsWith("System.Threading") == true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip doing this and improve grouping on the server side for this kind of stuff in general.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitsandfoxes could you link to the associated PR where this logic gets changed on the server?

Also, if we do skip this here, we'll need to make some changes to set a custom fingerprint that excludes all this stuff (since by default the entire stacktrace is used as the fingerprint).

Copy link
Contributor

@bitsandfoxes bitsandfoxes Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up on this. The blocking detection is opt-in and the improvements to the grouping are not blocking this PR. #3202

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up on this. The blocking detection is opt-in and the improvements to the grouping are not blocking this PR. #3202

If we remove the logic above from the client, the grouping will be broken unless some similar logic exists on the server.

I had a chat to Bruno though... Not sure we need to be doing this on the server.

@bitsandfoxes bitsandfoxes mentioned this pull request Mar 7, 2024
Copy link
Contributor

@bitsandfoxes bitsandfoxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me!

@bruno-garcia
Copy link
Member Author

Thanks folks!

@bruno-garcia bruno-garcia merged commit d1e5efc into main Mar 8, 2024
30 checks passed
@bruno-garcia bruno-garcia deleted the feat/blocking-detector branch March 8, 2024 03:10
if (_options.CaptureBlockingCalls && _monitor is not null)
{
var syncCtx = SynchronizationContext.Current;
SynchronizationContext.SetSynchronizationContext(syncCtx == null ? _detectBlockingSyncCtx : new DetectBlockingSynchronizationContext(_monitor, syncCtx));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bruno-garcia won't the SynchronizationContext in ASP.NET Core always be null? Is this code here just in case SDK users have implemented a custom sync context for some reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blocking detection
5 participants