-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that schema transformations SHOULD overwrite input data #3505
Clarify that schema transformations SHOULD overwrite input data #3505
Conversation
Some schema changes may describe a transformation that results in the conflicts with | ||
the input data. In situations like this **the transformations described by Schema File | ||
SHOULD take precedence** over the conflicting data present in the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data)
What are the reasons for classifying some data to be more important than other?
In the example given below, an instrumentation library might emit host
according to semantic conventions, and a user might add a custom attribute host.name
. When transforming this data to conform to schema 1.1.0 according to the conflict handling rule proposed here, the data explicitly specified by the customer in host.name
will be lost. In my opinion it cannot be generally said that the one is more important than the other, but would depend on the particular use case.
Probably this is a rare edge case, however, I consider it problematic to drop data during schema conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the reasons for classifying some data to be more important than other?
An attribute defined in official semantic conventions and subject to schema file is deemed more important than a custom attribute. Arguable, I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that custom attributes should not use Otel namespaces. So this is someone doing a bad job at following Otel recommendations, polluting data with custom attributes with bad names that are later used by Otel for a likely different purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say a custom attribute set by the application explicitly is much more important than any automation.
From "the least surprise" principle, it'd be confusing to override it and hard to investigate who and why overridden custom data.
It's also unclear why the instrumentation of V-1 supports attributes from version V, so this telemetry is inconsistent and broken already. Fixing it does not seem beneficial.
It seems to be best to warn (once) and not touch this data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment at #3497 for why "not touch" is hard to implement.
Resolves: open-telemetry#3497 Alternates Considered ===================== Instead of requiring overwriting the input data we could say that the transformation SHOULD be aborted if a conflict is detected or that transformation SHOULD be ignored if a conflict is detected. The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data) while also having minimal complexity of implementation (aborting and reporting errors is more complicated and more costly). Why "SHOULD" and not "MUST"? ============================ There are significant tradeoffs involved in what happens in the schema transform and we believe there may be valid use cases where after careful consideration a different approach may be taken (e.g. reject and surface the problem to the end user). We do not want to prohibit these other approaches. The goal of this change is to merely have a reasonable default interpretation of the schema transformation rules.
f1a1d40
to
aab0407
Compare
It is probably worth discussing the approaches in the issue #3497 (comment) before discussing this PR (which implements one of the suggested approaches). |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Anyone interested in this topic see #3497 for discussion. |
Resolves: #3497
Alternates Considered
Instead of requiring overwriting the input data we could say that the transformation SHOULD be aborted if a conflict is detected or that transformation SHOULD be ignored if a conflict is detected.
The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data) while also having minimal complexity of implementation (aborting and reporting errors is more complicated and more costly).
Why "SHOULD" and not "MUST"?
There are significant tradeoffs involved in what happens in the schema transform and we believe there may be valid use cases where after careful consideration a different approach may be taken (e.g. reject and surface the problem to the end user). We do not want to prohibit these other approaches. The goal of this change is to merely have a reasonable default interpretation of the schema transformation rules.