Optional vs required attributes in HTTP conventions and beyond #2114

lmolkova · 2021-11-10T02:36:07Z

Based on instrumentation SIG meeting discussion

Problem

Optional attributes in semantic conventions leave the decision to populate them on the instrumentation library.
However, this decision should be in the user/backend/distro hands: this is where you know what information you'd like to see and pay for.
So one library that doesn't populate e.g. http.request_content_length (let's put aside the need to have it on spans) makes the decision for all backends and users of this library.

Proposal

TL;DR: Instrumentation should collect all useful attributes. When some attributes are truly optional (i.e. rarely needed), there should be a configuration option for them. This issue does not try to address configuration.

Instrumentation should populate all generally useful attributes (and we should probably clean them up at least for HTTP).
When there is a need for optional attributes (e.g. because the collection is too expensive and/or attributes are not generally useful), instrumentation libraries should provide configuration options and populate them in opt-in mode
- is seems the current approach to leave it up to language+per group of attributes (e.g. http headers) might not scale well, but we can address it separately
The consumption side (distro/backend/user) should decide which attributes to export. E.g. exporter can ignore http.scheme if it has no use for it. Nothing new here.

What does it mean for HTTP (example)

This is a general semantic convention discussion, but if we go through this exercise for HTTP, here's what I suggest.

All of the below is just an example of this approach, we can discuss each of those suggestions separately from this issue

Let's pick one set of common attributes to make collection consistent (perhaps a specific set worth a separate discussion)

http.url for clients
e.g. http.scheme, net.host.name, net.host.port, http.target for servers
- it boils down to unification on http.host net.host.name and http.server_name
- if scheme is only there to reconstruct url - do we really need it?

It leaves us with a few more generally-useful attributes:

client and server:
- http.method
- http.status_code
server only:
- http.route - I assume if your web framework supports it, you'd want it
- http.server_name - it seems it's supported by a few instrumentations and if it is supported, it's important and generally-useful

Following attributes should be collected from headers through header collection configuration - we should not have two different attributes for each of them. they are optional in a way that user/distro enables them
- http.host
- http.user_agent
- http.request_content_length
- http.response_content_length
- http.client_ip - is there a way to populate it which is not X-Forwarded-For?
it leaves us with probably not-so-commonly-used attributes which I cautiously suggest removing unless there is some important scenario behind them
- http.flavor does anyone really need it?
- http.request_content_length_uncompressed - it's not always HTTP client responsibility to decompress or to await for decompression to finish - the instrumentation is probably feasible in a subset of cases - should we keep it?
- http.response_content_length_uncompressed

As a result, we'd have common attributes every instrumentation should populate, headers behind configuration and group 4 (which I hope we can remove)

The text was updated successfully, but these errors were encountered:

lmolkova · 2021-11-10T02:36:47Z

/cc @denisivan0v @pyohannes @anuraaga

anuraaga · 2021-11-10T03:47:21Z

I definitely believe instrumentation should be able to fill all conventions as proposed here - instrumentation needs to provide that baseline which can be then trimmed down to needed bits through configuration or similar.

I don't know if there's much point in having classifiers like required and optional at that point - I can imagine backends providing a recommended semantic configuration, and users may customize this further. I don't think either of these would really be interested in whether it's labeled required here, they have much more knowledge of what conventions are useful with that backend, or in that user app, then a spec author.

Some reasons I can think of for excluding attributes

Backend completely ignores anyways
Backend can't handle high cardinality attributes
Backend deems certain attributes useful in their experience and try to reduce user cloud bill
User knows all their APIs use POST and don't need that information

These seem quite diverse and there are probably more. If OTel instrumentation is able to fill all conventions in order to provide a baseline, I'm not sure how to go further to navigate these waters and make any recommendations on what is required or not.

kenfinnigan · 2021-11-10T21:08:47Z

These are very interesting points I hadn't really thought of, and I agree with @anuraaga's comments around applicability of required/optional delineation for backends.

However, I can see the benefit for some level of "required" information to enforce that all backends are able to understand those attributes and their meaning. Granted, it doesn't guarantee a backend actually utilizes the attribute in visualizations, or other ways.

For me, it comes down to whether it is a goal of OpenTelemetry to define a "required" set of attributes that all vendor backends need to understand/utilize? Or is it sufficient to define the possible set of attributes and their meaning?

If defining required attributes is not a goal, differentiating between attributes in required and optional terms likely does not add much value, and it can be left up to vendors and organizations as to what their needs are.

pyohannes · 2021-11-11T01:11:17Z

For me, it comes down to whether it is a goal of OpenTelemetry to define a "required" set of attributes that all vendor backends need to understand/utilize?

OpenTelemetry doesn't define any requirements for vendor backends, but it does for instrumentation. By doing that, it should give vendor backends a baseline they can rely on, but it's completely up to vendor backends which attributes they utilize.

TL;DR: Instrumentation should collect all useful attributes. When some attributes are truly optional (i.e. rarely needed), there should be a configuration option for them. This issue does not try to address configuration.

But "truly optional" attributes would still be defined in the semantic conventions?

lmolkova · 2021-11-11T07:01:29Z

I guess we're talking about similar things, is this a fair summary?

Some attributes that are always worth collecting despite perf impact
Others that are useful, but not for everyone. Since collecting them adds overhead, we might want to optimize collection.
Some additional, niche things, like custom headers - convention exists for prefix only or doesn't exist at all

And we need to pick a default level for instrumentation: 1 or 1+2. And then instrumentation allows to enable 2 or disable it depending on default we choose.

3 is always opt-in if defined.

On the backend side, backend is always free to use or drop attribute, I don't see any need to spec backend behavior here.

kenfinnigan · 2021-11-11T17:11:12Z

Given vendors can pick and choose which attributes they utilize, we should look to have the smallest set of required attributes that aligns with all vendor backends.

yurishkuro · 2021-11-11T17:26:07Z

Why can't the backend functionality gracefully degrade when certain attributes are not present? Can someone give an example of an attribute whose absence would be catastrophic/non-recoverable for the backend?

lmolkova · 2021-11-11T21:25:29Z

Why can't the backend functionality gracefully degrade when certain attributes are not present?

It can, the problem is that there is no path to a great experience when certain instrumentation chooses not to expose optional attributes.

E.g. http.route is optional currently, meaning that any instrumentation should feel free to skip it. Without route, there is no good way to query spans or build metrics for the given high-cardinality path. And there is nothing users or backends can do to improve it without changing instrumentation.

lmolkova · 2021-11-11T21:29:46Z

Given vendors can pick and choose which attributes they utilize, we should look to have the smallest set of required attributes that aligns with all vendor backends.

assuming we pick the bare minimum, what do we do with other useful attributes? Do we require instrumentation to add them behind configuration? add depending on the author's choice? not add at all? I'm trying to create some clarity on this particular part and find defaults.

kenfinnigan · 2021-11-11T21:34:49Z

Given vendors can pick and choose which attributes they utilize, we should look to have the smallest set of required attributes that aligns with all vendor backends.

assuming we pick the bare minimum, what do we do with other useful attributes? Do we require instrumentation to add them behind configuration? add depending on the author's choice? not add at all? I'm trying to create some clarity on this particular part and find defaults.

As you suggested previously, it feels like there are really 3 levels we need. The first is the bare minimum required attributes that most/all vendor backends will support, second is a group of attributes the community believes are beneficial but backends may not recognize, and then all other attributes.

I can see 2 and 3 being behind instrumentation flags, but 2 is enabled by default as the community sees benefit in those attributes

anuraaga · 2021-11-12T00:53:19Z

For me, it comes down to whether it is a goal of OpenTelemetry to define a "required" set of attributes that all vendor backends need to understand/utilize? Or is it sufficient to define the possible set of attributes and their meaning?

I think currently the definition of a backend only involves OTLP

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/vendors.md

It would be a big addition to add attribute processing too as something a backend must do to support OTel. I'm not sure there is much benefit to it though - for example, even saying it's ok to not be any visualization as long as the attribute is supported. Does that mean it's actually supported?

I would expect then recommendations getting even more technical - all required attributes must be indexed and usable as filters in queries, for example. If going that far then I think the classifications do then provide action items for backends. But I guess OTel is not generally meant to spec out backends in such detail.

Otherwise I'm not sure what actually gets gained by having classifications vs not. It seems to only make certain attributes opt-in and others opt-out, but that seems to make the experience less intuitive, especially if some backend semantic config will just get copied in anyways.

yurishkuro · 2021-11-12T03:55:49Z

Why can't the backend functionality gracefully degrade when certain attributes are not present?

It can, the problem is that there is no path to a great experience when certain instrumentation chooses not to expose optional attributes.

What is the path to great experience if the application chooses to disable tracing altogether? Disabling tracing is just an extreme case of not sending all useful attributes, i.e. verbosity level 0. The more data they send, the better/richer experience they get. I think any backend would still have to deal with apps that simply cannot use common instrumentation (maybe because they run on proprietary frameworks), and may not send the data that is 100% according to the spec. And again, the closer they are to the spec, the better service they get.

lmolkova · 2021-11-12T15:27:12Z

If user disables tracing, it's user's choice. using uncommon instrumentation or writing a custom one is also a user choice.
The goal of this issue is to make sure common instrumentations give user the choice to send more or less data. Currently this choice is on instrumentation

iNikem · 2021-11-13T08:35:54Z

The goal of this issue is to make sure common instrumentations give user the choice to send more or less data.

Now I am confused :) Is this issue about providing configuration mechanism or about marking some attributes as required?

I think that http.route is a perfect counter-example to your arguments :) You say

E.g. http.route is optional currently, meaning that any instrumentation should feel free to skip it. Without route, there is no good way to query spans or build metrics for the given high-cardinality path. And there is nothing users or backends can do to improve it without changing instrumentation.

I understand it that you want to mark http.route as required attribute for SERVER spans. But it is not always possible for auto-instrumentation to provide it. My HTTP server may not use declarative routing that auto-instrumentation can easilyt understand. E.g. Netty server handler can use totally app specific if branching or whatnot.

And this is my general feedback: for me, required attribute means that ALL instrumentation libraries (including auto-instrumentations) MUST provide this attribute. So it is more about the capabilities of instrumentations than about the need of clients.

anuraaga · 2021-11-13T14:19:19Z

route is a nice example - I feel it provides information on both sides.

route can be hard to populate with instrumentation as it generally needs a well-designed underlying framework
route is basically necessary for useful metrics for HTTP server

I feel the second point is more important though, and applies to tracing as well. I suspect most people that have used tracing know how important route is to be there when searching. I would call instrumentation that doesn't populate it to be "low quality instrumentation". It is fine for this to exist, instrumentation can only go as far as the framework being instrumented. It's why we generally use higher level frameworks, and why they naturally are easier to instrument. I don't feel the words required and optional capture this well. It's why I actually hope the distinction can just disappear, it doesn't seem to be helping anyone in practice.

lmolkova · 2021-11-17T07:00:16Z

I agree that required and optional words describe technical limitations that instrumentations and frameworks have, but do not help new instrumentation authors understand what and when they should provide.

So I'd change it to the following categories:

MUST/required -> provide always, no configuration necessary (we believe all frameworks and instrumentation should be able to provide them)
the rest of the attributes that are currently SHOULD/optional:
- attributes instrumentation should provide if library/framework supports them (e.g. http.route) -> provide if available no configuration necessary
- either opt-out (e.g. http.user_agent?) -> provide by default, but allow to disable - based on the app configuration
- or opt-in (e.g. custom headers) -> don't provide by default, but allow to enable - based on app configuration

since provide always is not really different from provide if available, we can merge those two for simplicity into provide if available

kenfinnigan · 2021-11-17T14:49:02Z

What would be the expected result for a provide always attribute if it's not there? Is it an error, or it falls back to a provide if available?

My one concern with merging provide always and provide if available is making attributes which should be there, less likely to be there as there is no longer an always requirement.

pyohannes · 2021-11-17T16:47:06Z

My one concern with merging provide always and provide if available is making attributes which should be there, less likely to be there as there is no longer an always requirement.

I think @lmolkova's intention of merging those is to come with something like always provide if available, or must provide if available.

kenfinnigan · 2021-11-17T19:55:53Z

Thanks @pyohannes, that helped my mental model.

I like the addition of always to indicate the required nature of the attribute.

anuraaga · 2021-11-18T00:43:12Z

@kenfinnigan I think users need to be in control of their data (always 😂 ) - I can't imagine having any attribute that is not allowed to be disabled somehow, that seems to take too much control away. I am a fan of merging into provide if available personally.

kenfinnigan · 2021-11-18T01:03:58Z

I see your point. I've been coming at it from the perspective of a minimum set of information provided to backends, but I appreciate OTeL doesn't control that aspect

lmolkova · 2021-11-22T20:04:59Z

it sounds like we have some preliminary consensus here, I'll send a PR to HTTP spec as a start. Once we have an agreement there, we can generalize it beyond HTTP (and it would require some tooling change around yml).

lmolkova · 2022-03-25T21:58:37Z

I'm making a revision of what OTel instrumentations populate (as a step to understand how to clarify what's required and possible) around sets of client and server attributes.

.NET

Attribute	Client	Server
http.url	yes	yes (constructed from fragments)
net.peer.ip	no	yes
net.peer.port	no	yes
net.peer.name	no	no
http.host	yes (extracted from url)	yes
net.host.ip	no	no
net.host.name	no	no
net.host.port	no	no
http.server_name	no	no
http.route	no	yes
http.target	no	yes
http.scheme	metrics only	metrics only
http.flavor	yes (opt-in)	yes (opt-in)
http.content_length	no	no

Java

Attribute	Client	Server
http.url	yes	no
net.peer.ip	no	no
net.peer.port	yes	no
net.peer.name	yes	no
http.host	no	yes (all)
net.host.ip	no	no
net.host.name	no	no
net.host.port	no	no
http.server_name	no	yes (but only Armeria server instrumentation)
http.route	yes ( ratpack client 🤔)	yes. Most instrumentations populate route, others fall back to path
http.target	no	yes, many construct it from path + query
http.scheme	no	yes
http.flavor	yes	yes
http.*content_length	some	some
http.*content_length_uncompressed	no	no

Go

Attribute	Client	Server
http.url	yes	no
net.peer.ip	yes, only if request host is IP address	yes, only if request host is IP address
net.peer.port	yes	yes
net.peer.name	yes	yes
http.host	yes	yes
net.host.ip	yes	yes
net.host.name	yes	yes
net.host.port	yes	yes
http.server_name	no	yes
http.route	no	available on filters, but not on middleware. When not available path or empty string is used
http.target	no	yes
http.scheme	yes	yes
http.flavor	yes	yes
http.*content_length	yes	yes
http.*content_length_uncompressed	no	no

Python

Attribute	Client	Server
http.url	yes (all)	yes
net.peer.ip	only urllib	some
net.peer.port	no	some
net.peer.name	no	some
http.host	no	yes (constructed )
net.host.ip	no	no
net.host.name	no	no
net.host.port	no	most
http.server_name	no	most
http.route	no	yes
http.target	no	yes
http.scheme	no	yes
http.flavor	no	yes
http.*content_length	no	no

JS

Attribute	Client	Server
http.url	yes (constructed )	yes (constructed )
net.peer.ip	yes	no
net.peer.port	yes	no
net.peer.name	yes (constructed)	no
http.host	yes (constructed)	no
net.host.ip	no	yes
net.host.name	no	yes (constructed )
net.host.port	no	yes
http.server_name	no	yes (manual configuration)
http.route	no	yes
http.target	yes	yes
http.scheme	yes	no
http.flavor	yes	yes
http.content_length	yes	yes

Several observations:

Clients
- all clients provide http.url.
- clients inconsistently provide net.peer attributes
- for some clients, url needs to be constructed
Servers
- in most cases url needs to be constructed. Many server instrumentation don't provide url
- all servers provide http.host, some provide http.server_name
- net.host.* attribute usage is inconsistent and rare
- net.host.name description does not suggest how it can be used for HTTP and how it should be populated
- server instrumentations populate http.route, but fallback to path, which could be problematic for metrics

Proposal:

UPDATE: taking into account @trask's awesome suggestion on metrics needs (below)

Client: the only set of attributes clients should support is: http.url, net.peer.name, net.peer.port

require http.url, net.peer.name (use IP if only IP is avaialble), net.peer.port (when not default)
Make net.peer.ip attribute optional (opt-in?)

Server: the only set of attributes servers should support is: http.scheme, net.host.name, net.host.port, http.target, http.route

Require net.host.name, ask for it's value to be the best available server name with specific priority (server-name, host header, ip address)
Require (when available) http.route - it's an essential attribute for both server traces and metrics
Remove http.host - host header should be populated (opt-in) using http.request.header.host
Remove http.server_name - if it's available it's populated in net.host.name
Make http.url optional (opt-in)
Make net.host.ip attribute optional (opt-in?)
Require to populate http.route only when it's available (i.e. guarantee low-cardinality)

trask · 2022-03-25T22:56:06Z

I really like this proposal 👍 👍

Only two comments:

Make net.peer.* attributes optional (opt-out?)

These are important http client metric attributes. Our current thinking in Java instrumentation is that metric attributes will always be a subset of span attributes (on operations where we capture both spans and metrics).

TBD on http.route

Similarly, this is an important http server metric attribute.

lmolkova · 2022-03-25T23:35:48Z

thank you @trask! Agree, and added your suggestion. Will check existing instrumentation for http.route in the next few days. (update: done)

jamesmoessis · 2022-03-30T04:35:31Z

metric attributes will always be a subset of span attributes (on operations where we capture both spans and metrics)

100% on this! Very useful for folks wanting to generate metrics from spans.

lmolkova added the spec:trace Related to the specification/trace directory label Nov 10, 2021

github-actions bot assigned carlosalberto Nov 10, 2021

anuraaga mentioned this issue Nov 11, 2021

Data Classifications for resources and attributes open-telemetry/oteps#187

Closed

arminru added area:semantic-conventions Related to semantic conventions semconv:HTTP labels Nov 16, 2021

lmolkova mentioned this issue Apr 5, 2022

Remove alternative attribute sets from HTTP semantic conventions #2469

Merged

denisivan0v mentioned this issue Apr 19, 2022

Agree on the scope for HTTP semantic conventions v1.0 #2499

Closed

4 tasks

lmolkova mentioned this issue Apr 20, 2022

Remove http.XX_content_length* semantic attributes #2028

Closed

lmolkova mentioned this issue May 3, 2022

Define attribute requirement levels #2522

Merged

lmolkova mentioned this issue Jun 22, 2022

Define net.sock attributes and clarify logical net.peer|host.name attributes #2614

Merged

jmacd closed this as completed in #2469 Jul 27, 2022

lzchen mentioned this issue Jan 25, 2023

Update http instrumentations to spec v1.17 for semantic conventions open-telemetry/opentelemetry-python-contrib#1600

Closed

arminru mentioned this issue Feb 3, 2023

Define requirement levels for metrics and their meaning #3176

Open

trask mentioned this issue Apr 27, 2023

BREAKING: Introduce common url.* attributes, and improve use of namespacing under http.* #3355

Merged

lmolkova mentioned this issue Jun 14, 2023

Make server.address and server.port to opt-in on HTTP server metrics open-telemetry/semantic-conventions#109

Merged

brettmc mentioned this issue Jan 5, 2024

update semantic conventions to v1.19 open-telemetry/opentelemetry-php#965

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional vs required attributes in HTTP conventions and beyond #2114

Optional vs required attributes in HTTP conventions and beyond #2114

lmolkova commented Nov 10, 2021 •

edited

Loading

lmolkova commented Nov 10, 2021

anuraaga commented Nov 10, 2021

kenfinnigan commented Nov 10, 2021

pyohannes commented Nov 11, 2021

lmolkova commented Nov 11, 2021 •

edited

Loading

kenfinnigan commented Nov 11, 2021

yurishkuro commented Nov 11, 2021

lmolkova commented Nov 11, 2021 •

edited

Loading

lmolkova commented Nov 11, 2021

kenfinnigan commented Nov 11, 2021

anuraaga commented Nov 12, 2021

yurishkuro commented Nov 12, 2021

lmolkova commented Nov 12, 2021 •

edited

Loading

iNikem commented Nov 13, 2021

anuraaga commented Nov 13, 2021

lmolkova commented Nov 17, 2021 •

edited

Loading

kenfinnigan commented Nov 17, 2021

pyohannes commented Nov 17, 2021

kenfinnigan commented Nov 17, 2021

anuraaga commented Nov 18, 2021

kenfinnigan commented Nov 18, 2021

lmolkova commented Nov 22, 2021 •

edited

Loading

lmolkova commented Mar 25, 2022 •

edited

Loading

trask commented Mar 25, 2022

lmolkova commented Mar 25, 2022 •

edited

Loading

jamesmoessis commented Mar 30, 2022

Optional vs required attributes in HTTP conventions and beyond #2114

Optional vs required attributes in HTTP conventions and beyond #2114

Comments

lmolkova commented Nov 10, 2021 • edited Loading

Problem

Proposal

What does it mean for HTTP (example)

lmolkova commented Nov 10, 2021

anuraaga commented Nov 10, 2021

kenfinnigan commented Nov 10, 2021

pyohannes commented Nov 11, 2021

lmolkova commented Nov 11, 2021 • edited Loading

kenfinnigan commented Nov 11, 2021

yurishkuro commented Nov 11, 2021

lmolkova commented Nov 11, 2021 • edited Loading

lmolkova commented Nov 11, 2021

kenfinnigan commented Nov 11, 2021

anuraaga commented Nov 12, 2021

yurishkuro commented Nov 12, 2021

lmolkova commented Nov 12, 2021 • edited Loading

iNikem commented Nov 13, 2021

anuraaga commented Nov 13, 2021

lmolkova commented Nov 17, 2021 • edited Loading

kenfinnigan commented Nov 17, 2021

pyohannes commented Nov 17, 2021

kenfinnigan commented Nov 17, 2021

anuraaga commented Nov 18, 2021

kenfinnigan commented Nov 18, 2021

lmolkova commented Nov 22, 2021 • edited Loading

lmolkova commented Mar 25, 2022 • edited Loading

.NET

Java

Go

Python

JS

Proposal:

trask commented Mar 25, 2022

lmolkova commented Mar 25, 2022 • edited Loading

jamesmoessis commented Mar 30, 2022

lmolkova commented Nov 10, 2021 •

edited

Loading

lmolkova commented Nov 11, 2021 •

edited

Loading

lmolkova commented Nov 11, 2021 •

edited

Loading

lmolkova commented Nov 12, 2021 •

edited

Loading

lmolkova commented Nov 17, 2021 •

edited

Loading

lmolkova commented Nov 22, 2021 •

edited

Loading

lmolkova commented Mar 25, 2022 •

edited

Loading

lmolkova commented Mar 25, 2022 •

edited

Loading