Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current "Invalid" SpanContext definition precludes TraceState-only SpanContext #753

Open
Oberon00 opened this issue Jul 31, 2020 · 12 comments
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@Oberon00
Copy link
Member

Oberon00 commented Jul 31, 2020

Current Definition of IsValid

Currently, we consider a SpanContext invalid if the trace or span ID is invalid (i.e. it is invalid unless both are valid), and this is exposed with an API function. As a result, for example in Java, an invalid span context is not inherited by child spans (a fresh context with new span and trace ID is generated, discarding any TraceState or TraceFlags).

It is also not propagated by any propagators, but that is a decision the propagators make for themselves and could be overriden by custom propagators.

One problem with this is, if you wanted to implement a non-W3C-based extractor that doesn't have a trace+span ID available, you are now forced to invent a trace+span ID on your own in the propagator.

Data that might be on an extracted SpanContext

  1. IsRemote will be true.
  2. A sampled flag has to be there (even if none was propagated via the original propagation protocol, the propagator will have to invent one)
  3. A trace ID might be there (even if there is no span ID)
  4. A span ID might be there (even if there is no trace ID)
  5. TraceState entries might be there (which may be read from headers other than W3C tracestate)

Proposed spec changes

A SpanContext MUST be considered valid unless it has neither a valid trace nor span ID, nor IsRemote=true.

Rationale: The minimal information that can be contained on a remote parent is just the sampled flag, and we may still want to respect that.

This has one implication for the SDK spec I'm aware of:

The tracing SDK MUST accept any valid SpanContext as per the above definition as parent for a span if requested. However, a child span MUST NOT inherit an invalid parent trace ID. In that case the SDK MUST generate a new trace ID (as it would do if no parent was used). All other properties of the SpanContext are inherited as usual (e.g. IsRemote is made false, TraceState is copied).

We should also clarify (independent of other changes):

An invalid SpanContext MUST be treated as if no SpanContext was provided in all cases where the option of providing no SpanContext exists (and vice versa).

@Oberon00 Oberon00 added area:api Cross language API specification issue spec:trace Related to the specification/trace directory labels Jul 31, 2020
@carlosalberto carlosalberto added release:required-for-ga Must be resolved before GA release, or nice to have before GA priority:p2 Medium priority level labels Jul 31, 2020
@bogdandrutu
Copy link
Member

@Oberon00 there is already a definition of "IsValid", what else do we need?

@Oberon00
Copy link
Member Author

Oh, I missed that. But my argument against the current definition (I thought was a Java thing) still stands.

@Oberon00 Oberon00 changed the title "Invalid" SpanContext is not defined Current "Invalid" SpanContext definition precludes TraceState-only SpanContext Aug 11, 2020
@bogdandrutu
Copy link
Member

bogdandrutu commented Aug 19, 2020

@Oberon00 I think you need to challenge this in w3c specs, see https://www.w3.org/TR/trace-context/#no-traceparent-received 4.2.3 and https://www.w3.org/TR/trace-context/#a-traceparent-is-received 4.3.5

Because of this I think we should not derive from the w3c definition and our SpanContext should follow the same behavior.

@bogdandrutu
Copy link
Member

I think this is too late in the game and we have not proven this. I would vote to not add this change before GA because we haven't proved how will this be used, and I think the only use-case mentioned here is to be able to "force" sampling which can be achieved with different mechanism.

@Oberon00 Oberon00 added release:after-ga Not required before GA release, and not going to work on before GA and removed release:required-for-ga Must be resolved before GA release, or nice to have before GA priority:p2 Medium priority level labels Aug 20, 2020
@Oberon00
Copy link
Member Author

Oberon00 commented Aug 20, 2020

I think you need to challenge this in w3c specs

No, not at all. It's totally OK for me if the W3C propagator drops a tracestate it considers invalid. But currently this decision is baked into the SDK API, so other propagators (or even vendor SDKs) cannot make a different choice (another case where SpanContext's final-ness is limiting everyone, cf. #759 (comment)).

the only use-case mentioned here is to be able to "force" sampling

I have a more practical use case too: Currently at Dynatrace we patched opentelemetry-java because we have situations in which we cannot pass a tracestate header from parent to child, so we have to rely on existing message IDs (which we do not get to choose) instead. Thus we have a custom propagator that generates a tracestate-entry containing this parent message ID (and no span or parent ID). The SDK then generates a new trace + span ID for each child of this, but preserves the parent message ID tracestate entry. On the parent side, the "child" message ID is recorded as a Span attribute, so that the server can link these together.

I would vote to not add this change before GA

Fair enough, I'll reopen it with after-ga then.

@Oberon00 Oberon00 reopened this Aug 20, 2020
@Oberon00
Copy link
Member Author

The automation seems to not have picked up the label change, I manually archived the card in the burndown chart, hope this is ok @andrewhsu

@anuraaga
Copy link
Contributor

anuraaga commented Nov 3, 2020

FWIW, I agree that the w3c propagator needs to drop trace state to be compliant with that spec, but I think it's ok for non-w3c propagation and in-process propagation to keep it if it's there, it's a different layer.

@Oberon00
Copy link
Member Author

Oberon00 commented Nov 3, 2020

Yeah, the main issue is about SDKs that drop tracestate on their own for child spans, e.g. as Java does in SpanBuilderSdk.

@anuraaga
Copy link
Contributor

anuraaga commented Nov 3, 2020

Yup it's what I meant by in-process propagation - I don't see a real reason for SDKs to drop it and we could probably require it to be preserved.

@Oberon00
Copy link
Member Author

Oberon00 commented Nov 3, 2020

OK, not dropping traceState is probably straightforward (dropping tracestate would be the W3C propagators responsibility, not the SDKs), but what about the other listed information? Should a trace ID be used even if there is no Span ID? Should the sampled bit be interpreted if the SpanContext is non-empty?

@anuraaga
Copy link
Contributor

anuraaga commented Nov 3, 2020

Should a trace ID be used even if there is no Span ID?

Cool question - I think it should be. Philosophy is don't drop data that's already there. However, the opposite, no-trace-id should ideally also be propagated as-is if that's what came in. For example, b3:0 is all it takes to propagate a decision of no-sampling across the system if you don't care about log correlation - very efficient for requests where headers are larger than the payload.

Should the sampled bit be interpreted if the SpanContext is non-empty?

I don't follow this question - don't we always follow the sampled bit if it's present for Parent-based samplers? I think if it's present, it needs to be read always when a new span is created.

@Oberon00
Copy link
Member Author

Oberon00 commented Nov 3, 2020

The sampled bit cannot be not present. It can be unset though (which means the span was not sampled). This is maybe another problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests

4 participants