Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that schema transformations SHOULD overwrite input data #3505

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions specification/schemas/file_format_v1.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,63 @@ When converting in the opposite direction, from newer version Y to older version
X the order of transformation listed above is exactly the reverse, with each
individual transformation also performing the reverse conversion.

### Transformation Conflicts

Some schema changes may describe a transformation that results in the conflicts with
the input data. In situations like this **the transformations described by Schema File
SHOULD take precedence** over the conflicting data present in the input.
Comment on lines +344 to +346
Copy link
Contributor

@pyohannes pyohannes May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chosen approach (overwriting) seems to be best balance between maintaining reasonable outcome (keep the more important data, discarding the less important data)

What are the reasons for classifying some data to be more important than other?

In the example given below, an instrumentation library might emit host according to semantic conventions, and a user might add a custom attribute host.name. When transforming this data to conform to schema 1.1.0 according to the conflict handling rule proposed here, the data explicitly specified by the customer in host.name will be lost. In my opinion it cannot be generally said that the one is more important than the other, but would depend on the particular use case.

Probably this is a rare edge case, however, I consider it problematic to drop data during schema conversions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the reasons for classifying some data to be more important than other?

An attribute defined in official semantic conventions and subject to schema file is deemed more important than a custom attribute. Arguable, I know.

Copy link
Member Author

@tigrannajaryan tigrannajaryan May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that custom attributes should not use Otel namespaces. So this is someone doing a bad job at following Otel recommendations, polluting data with custom attributes with bad names that are later used by Otel for a likely different purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say a custom attribute set by the application explicitly is much more important than any automation.

From "the least surprise" principle, it'd be confusing to override it and hard to investigate who and why overridden custom data.

It's also unclear why the instrumentation of V-1 supports attributes from version V, so this telemetry is inconsistent and broken already. Fixing it does not seem beneficial.

It seems to be best to warn (once) and not touch this data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment at #3497 for why "not touch" is hard to implement.


Let's look at an example to understand how this works.

For example Schema change from version 1.0.0 to 1.1.0 may describe renaming of a
resource attribute `host` to `host.name`:

```yaml
versions:
1.1.0:
resources:
changes:
- rename_attributes:
attribute_map:
host: host.name
```
Let's assume we are attempting to apply this schema transformation to a resource that has
both attributes `host` and `host.name`:

```json
{
"attributes": { "host": "spool", "host.name": "spool.example.com" }
}
```

Applying the schema transformation the attribute `host.name` will be overwritten by
the attribute `host` and the resulting data will look like this:

```json
{
"attributes": { "host.name": "spool" }
}
```

This rule also applies if the transformation is performed in the backwards
direction (from newer version to older). Applying the transformation SHOULD overwrite
any conflicting data. For example with input data of:

```json
{
"attributes": { "host": "spool", "host.name": "spool.example.com" }
}
```

Transforming from schema 1.1.0 to schema 1.0.0 the output will be:

```json
{
"attributes": { "host": "spool.example.com" }
}
```

### Schema File Format Number

The "file_format" setting in the schema file specifies the format version of the
Expand Down