Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Span ID creation with sampling. #998

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ Updates:
- SDK: Specify known values, as well as basic error handling for OTEL_PROPAGATORS.
([#962](https://github.com/open-telemetry/opentelemetry-specification/pull/962))
([#995](https://github.com/open-telemetry/opentelemetry-specification/pull/995))
- SDK: Specify when to generate new IDs with sampling
([#998](https://github.com/open-telemetry/opentelemetry-specification/pull/998))
- Remove custom header name for Baggage, use official header
([#993](https://github.com/open-telemetry/opentelemetry-specification/pull/993))
- Trace API: Clarifications for `Span.End`, e.g. IsRecording becomes false after End
Expand Down
1 change: 1 addition & 0 deletions spec-compliance-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ status of the feature is not known.
|[Sampling](specification/trace/sdk.md#sampling)|
|Allow samplers to modify tracestate | | + | | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1220) | | + | | + | | |
|ShouldSample gets full parent Context | | + | + | + | | + | | | | |
|[Span/Trace ID creation wrt. sampling is compliant](specification/trace/sdk.md#sdk-span-creation) | | | | | | | | | | |

## Baggage

Expand Down
27 changes: 22 additions & 5 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,28 @@ The following table summarizes the expected behavior for each combination of
The SDK defines the interface [`Sampler`](#sampler) as well as a set of
[built-in samplers](#built-in-samplers) and associates a `Sampler` with each [`TracerProvider`].

When asked to create a Span, the SDK MUST query the `Sampler`'s [`ShouldSample`](#shouldsample) method before actually creating the span, and act accordingly:
see description of [`ShouldSample`'s](#shouldsample) return value below for how to set `IsRecording` and `Sampled` on the Span,
and the [table above](#recording-sampled-reaction-table) on whether to pass the `Span` to `SpanProcessor`s.
A non-recording span MAY be implemented using the same mechanism as when a `Span` is created with no API-implementation installed
(sometimes called a `NoOpSpan` or `DefaultSpan`).
### SDK Span creation
Oberon00 marked this conversation as resolved.
Show resolved Hide resolved

When asked to create a Span, the SDK MUST act as if doing the following in order:

1. If there is a valid parent trace ID, use it. Otherwise generate a new trace ID
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved
(note: this must be done before calling `ShouldSample`, because it expects
a valid trace ID as input).
2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method
(Note that the [built-in `ParentBasedSampler`](#parentbased) can be used to
use the sampling decision of the parent,
translating a set SampledFlag to RECORD and an unset one to DROP).
3. If the decision is `DROP` and there is a valid parent span ID, reuse it as the new `Span`'s span ID.
Copy link
Contributor

@jkwatson jkwatson Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean two spans with the same id? I can imagine this could become extremely confusing to someone trying to debug things. As a user, I think I'd be super confused if I saw a new Span created with the same ID as the parent, even if that span might not end up going anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's two spans with the same ID the same way as calling extract and then inject creates a copy of the injected span (with the same ID). It is more of a hollow shell of a span than an actual span.

even if that span might not end up going anywhere

This span cannot end up going anywhere. A dropped span is always non-recording, so there is nothing that could end up anywhere beside the span+trace ID itself (not even a parent span ID). What is more, SpanProcessors are not invoked with dropped spans.

RECORD_ONLY spans are a different story, and they do get a new ID, see #998 (comment)

All that being said, I think I could agree that this may make the mental model of sampling a bit more complicated than just always creating a new ID. But I hope that the benefits, namely ability to resume sampling in a child and having the closes sampled span as parent, and a slight(?) performance improvement would be worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think I'd prefer attacking this problem via a smarter SpanContext that allowed propagation of the relevant bits, rather than generating weird internal spans for the sake of propagating the SpanContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, better yet, move the TraceState and TraceFlags out of the span altogether and propagate them separately in the Context, rather than hanging them off a span.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should startSpan return for a dropped span then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regeneration looks like a safer choice that will not break SDK going forward

I think changing from any of them to the other will break some uses.

and is easier to explain

Fair enough.

the idea is that instrumented library needs a reliable unique ID of the operation

If you have an ID that is not associated with any trace on the tracing backend, will that be more helpful than the "inexact" ID of the parent operation? Note, you will have a valid span ID in any case.

I would say we go GA with this, and discuss how to improve on it using things like adding a GetUniqueOperationId function to Span that returns the span ID if it was generated for that span or generates a new ID that is cached and propagated to children if it is not. I recon propagating this ID cross-process may be difficult, but the same goes for propagating the ID of the last sampled span, and it seems unlikely that you call GetUniqueOperationId on an extracted span directly (it would not be very unique, since the parent's process would have the same ID if it was sampled at least).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regeneration looks like a safer choice that will not break SDK going forward

I think changing from any of them to the other will break some uses.

and is easier to explain

Fair enough.

the idea is that instrumented library needs a reliable unique ID of the operation

If you have an ID that is not associated with any trace on the tracing backend, will that be more helpful than the "inexact" ID of the parent operation? Note, you will have a valid span ID in any case.

I would say we go GA with this, and discuss how to improve on it using things like adding a GetUniqueOperationId function to Span that returns the span ID if it was generated for that span or generates a new ID that is cached and propagated to children if it is not. I recon propagating this ID cross-process may be difficult, but the same goes for propagating the ID of the last sampled span, and it seems unlikely that you call GetUniqueOperationId on an extracted span directly (it would not be very unique, since the parent's process would have the same ID if it was sampled at least).

The assumption that there is a trace backend may not be true for everybody. And having trace backend of your caller having a record of a parent span Id is not helpful to solve your problem. So the decision we made at @open-telemetry/technical-committee is to go with span ID regeneration. This creates a nice guarantee of local uniqueness of a spanID that various telemetry signals may take a dependency on.

How strongly do you disagree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very strongly. I do think my proposed solution is the better trade-off but I concede that always generating a new ID is also a reasonable option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oberon00 Thanks for the answer. I think we will go with the TC decision for now then - if you have time, would you mind updating this PR to match the proposal? Else, I can help by creating a spin-off of this PR. Let me know ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to create a spin-off PR. That way we clearly document that this option was rejected.

Otherwise (if the decision is not `DROP` or there was no valid parent span ID)
a new span ID MUST be generated.
4. Create a span depending on the decision returned by `ShouldSample`:
see description of [`ShouldSample`'s](#shouldsample) return value below
for how to set `IsRecording` and `Sampled` on the Span,
and the [table above](#recording-sampled-reaction-table) on whether
to pass the `Span` to `SpanProcessor`s.
A non-recording span MAY be implemented using the same mechanism as when a
`Span` is created with no API-implementation installed or as described in
[wrapping a SpanContext in a Span](api.md#wrapping-a-spancontext-in-a-span).

### Sampler

Expand Down