Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new library instrumentation for okhttp3 4.x+ #3780

Closed
wants to merge 2 commits into from

Conversation

jkwatson
Copy link
Contributor

@jkwatson jkwatson commented Aug 5, 2021

This new library instrumentation provides an okhttp Call.Factory implementation which properly handles internal context propagation with async callbacks.

}

public Dispatcher createDispatcher(ExecutorService executorService) {
return new Dispatcher(Context.taskWrapping(executorService));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this doesn't seem to be enough to always get the correct context into callbacks. OkHttp3Test library tests use a similar construct, but high concurrency test with callback test still fails. In okhttp:3.0.0 dispatcher will start queuing calls when too many of them are running at the same time so if you only make a few request this might work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. How is it on the 4.x line?

Does this mean that there is no way to get propagation to work right into okhttp async calls without bytecode munging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.9.1 fails the same way, FYI.

I still think this is probably the best we can do for now. Should we just include a very large caveat about this issue, or would it better to not even recommend that people try to propagate the context into async calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably better not to recommend that - as a user, you usually don't know if your application never crosses that magical 64 concurrent requests number, and if it happens to do that you'll get unreliable traces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SIG meeting yesterday evening/morning, we talked through this, and Brave has a better approach to this library instrumentation that I'll be starting to figure out how to port over to otel, which shouldn't have the same mis-parenting issues that the Dispatcher approach ends up with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I'm going to update this PR to warn about this approach, rather than recommending it. Although, with our current instrumentation, there is no way to reliably get context propagation to work. You either get none, period, or you get potentially broken/mis-parented traces/spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also updated this PR to include some new instrumentation which does work with the Call.Factory approach that was used in Brave.

This exposes a `Call.Factory` which will properly handle internal context propagation when used with async callbacks.
@anuraaga
Copy link
Contributor

@jkwatson Did you mean to push the code into this PR? If so can we switch to draft?

@jkwatson jkwatson changed the title Add a README for the okhttp library instrumentation. Add new library instrumentation for okhttp3 4.x+ Aug 10, 2021
@jkwatson
Copy link
Contributor Author

@jkwatson Did you mean to push the code into this PR? If so can we switch to draft?

I did. Why draft?

@anuraaga
Copy link
Contributor

@jkwatson Oh it's done? In that case first question is - does it need to target 4.x?

@jkwatson
Copy link
Contributor Author

It can't do 3.0.0 due to API changes to the Request interface around the tag functionality.

We could try to find some random 3.x version to target, but 4.0.0 seemed cleaner.

@anuraaga
Copy link
Contributor

@jkwatson Ah thanks for clarifying. Let's go ahead and target a 3.x release by raising our floor, we tend to do it when needed. Brave targets 3.11 so that seems reasonable

https://github.com/openzipkin/brave/blob/master/instrumentation/okhttp3/pom.xml#L33

It's released 7/2018 so old enough to target, especially since okhttp doesn't have breaking changes

https://square.github.io/okhttp/changelog_3x/#version-3110

Then we don't need to have two very similar instrumentations

Comment on lines +37 to +39
This instrumentation is only reliable for non-asynchronous usages of the `Call`. If you need instrumentation
for `Call.enqueue(Callback)` usages, it is recommended that you use the newer instrumentation which works
with okhttp3 versions 4+.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving this warning to the top of this file (above Quickstart)

@jkwatson
Copy link
Contributor Author

@jkwatson Ah thanks for clarifying. Let's go ahead and target a 3.x release by raising our floor, we tend to do it when needed. Brave targets 3.11 so that seems reasonable

https://github.com/openzipkin/brave/blob/master/instrumentation/okhttp3/pom.xml#L33

It's released 7/2018 so old enough to target, especially since okhttp doesn't have breaking changes

https://square.github.io/okhttp/changelog_3x/#version-3110

Then we don't need to have two very similar instrumentations

I'm nervous about 3.11 since the timeout method on the Call interface was added after that without a default implementation which is a breaking API change.

I was thinking that we should stop supporting 3.x since it's been EOLd IIRC.

This is what I meant about hunting for a 3.x version. :)

@anuraaga
Copy link
Contributor

This is what I meant about hunting for a 3.x version. :)

Ah I see, brave says "before", not after. I think we've already hunted though then, 3.12 seems like the right one. And if it's EOL then we know it won't break later :P So seems like free coverage.

@jkwatson
Copy link
Contributor Author

This is what I meant about hunting for a 3.x version. :)

Ah I see, brave says "before", not after. I think we've already hunted though then, 3.12 seems like the right one. And if it's EOL then we know it won't break later :P So seems like free coverage.

If we're ok with bumping up the minimum version for all the 3.x instrumentation, including agent-based, then I'm 100% ok with that myself.

@trask
Copy link
Member

trask commented Aug 10, 2021

including agent-based

I wouldn't want to bump the version for javaagent instrumentation. OkHttp is a P1 instrumentation, and older versions are much more important for javaagent instrumentation (e.g. system operators are users) compared to library instrumentation (e.g. devs are users).

@anuraaga
Copy link
Contributor

@trask We've raised version, e.g. for gRPC even when P1. 2018/07 seems like a reasonable release date for us to target here too? Maybe we need to actually formalize this though.

@trask
Copy link
Member

trask commented Aug 10, 2021

gRPC was a very small bump from 1.5 to 1.6

@anuraaga
Copy link
Contributor

We discussed in SIG and there is a lot of interest in keeping the agent floor to 3.0, especially to continue to say we support all OkHttp 2+.

For targeting 3.0, the timeout method I guess is not a problem since we can just implement it, which gets verified by our latestDepTest already. We have something similar for Lettuce

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/lettuce/lettuce-5.1/library/src/main/java/io/opentelemetry/instrumentation/lettuce/v5_1/OpenTelemetryTracing.java#L59

For the tag, I think either only using a private static final Cache<Request, Context> = Cache.newBuilder().setWeakKeys().build() or using reflection to determine a path that uses the cache vs the native tag are reasonable approaches to avoid a duplicate instrumentation library. Otherwise, we definitely need to extract a okhttp-common to hold the common code like the extractors - working around the tag limitation seems like a win vs that?

@jkwatson
Copy link
Contributor Author

We discussed in SIG and there is a lot of interest in keeping the agent floor to 3.0, especially to continue to say we support all OkHttp 2+.

For targeting 3.0, the timeout method I guess is not a problem since we can just implement it, which gets verified by our latestDepTest already. We have something similar for Lettuce

We can only "implement" it by also doing reflection on the delegate instance to see if it has the method as well. :(

For the tag, I think either only using a private static final Cache<Request, Context> = Cache.newBuilder().setWeakKeys().build() or using reflection to determine a path that uses the cache vs the native tag are reasonable approaches to avoid a duplicate instrumentation library. Otherwise, we definitely need to extract a okhttp-common to hold the common code like the extractors - working around the tag limitation seems like a win vs that?

Ugh. I really hate this idea, but if it's somehow preferable to just targeting 4.x as new library instrumentation for users who keep up with major versions, I guess I can add it to the PR. I would have greatly preferred keeping things simple and having a little bit of code duplication, since this PR is only relevant for library instrumentation users, correct?

@jkwatson
Copy link
Contributor Author

closing in favor of #3812

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants