Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document general attribute normalization conventions #117

Open
lmolkova opened this issue Jun 15, 2023 · 4 comments
Open

Document general attribute normalization conventions #117

lmolkova opened this issue Jun 15, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@lmolkova
Copy link
Contributor

In open-telemetry/opentelemetry-specification#17, we limit HTTP methods to a configurable list of known methods.

  • Unknown values are recorded as _OTHER (TBD, underscore and uppercase imply a magic string)
  • original value is preserved (on spans, potentially logs) under {attribute_name}_original attribute.

We should document the generalized approach so we can reuse it in similar cases - specific semantic conventions or via general advice API applied by users on arbitrary attributes (see open-telemetry/opentelemetry-specification#3545).

@Oberon00
Copy link
Member

Oberon00 commented Jun 18, 2023

Note that _original may be more rare than the opposite ("_normalized"?). E.g. if, after stabilization of a convention or simply to limit disruption, we decide that an attribute should be normalized, we might want to keep the existing attribute unchanged but add a new one with the normalized form. Potential example might be HTTP:

I believe having the unsuffixed attribute name contain sometimes the original, sometimes a normalized value is a slight inconsistency that has to be accepted, given that we will want to continously evolve our semantic conventions without breaking changes.

In conclusion, if we can, it may make sense to come up with a default suffix/prefix convention both for original and normalized values, and a recommendation whether normalized or raw should be in the plain attribute name by default if there are no clear reasons to prefer a certain way.

@lmolkova
Copy link
Contributor Author

@Oberon00 I agree adding _normalized would be better, but I could not find a generic approach for this case that does not involve duplication.

Assuming we discovered http.request.method cardinality problem after stability and tried to solve it with _normalized:

  • we'd either duplicate all valid http.request.method in http.request.method_normalized
  • or had to drop valid data from http.request.method, reserving it for 'bad' data on traces

The cons are outlined in open-telemetry/opentelemetry-specification#34.

@Oberon00
Copy link
Member

Oberon00 commented Jun 20, 2023

In the case of http.method, there were good arguments, and it's pre-stabilization. Had we gone ahead with stabilization (as we will) before discovering the issue / want for a normalized variant, it would have been different.

For example, the mentioned #1056. Duplication is an argument for favoring _original ("_raw"?) by default, but for breaking change vs. value duplication, we will probably arrive at a different decision.

So I am just personally sure that we will end up with both the "unsuffixed name is normalized" and "unsuffixed name is original" cases.

BTW, there is generally a third option to add to the two points you listed above, but I don't expect that one to be used much if at all, since it would also be a breaking change:

  • only set *_normalized if different from the unsuffixed attribute

@lmolkova
Copy link
Contributor Author

@Oberon00 I see your point. Perhaps we can keep this issue open for the time being and if we see other cases where this approach helps, we can generalize it.
Or we'll understand where it doesn't help (e.g. above "only set *_normalized if different from the unsuffixed attribute" would be the best when the attribute is not used on metrics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants