Opamp spec overloads definition of service.name #131

jack-berg · 2022-09-27T18:56:31Z

AgentDescription.identifying_attributes says the following about service.name:

service.name should be set to a reverse FQDN that uniquely identifies the Agent type, e.g. "io.opentelemetry.collector"

This definition contradicts the service.name definition in the resource semantic conventions, which define it as "Logical name of the service".

This distinction is important because presumably opamp is using the "Agent type" language to serve as an identifier for the class of agent. I.e. to distinguish between collectors, SDKs, and perhaps other future agent types. This is obviously necessary, but service.name isn't the right attribute.

When it comes time for opamp to be applied to SDKs, it won't be possible to assign a agent type identifier to service.name since SDKs have broadly adopted the the logical name definition. On a related note, it doesn't make sense to assign io.opentelemetry.collector to service.name in collectors either, since it impedes the ability to have multiple sets of collectors performing different functions, each with different logic service.name.

I think we probably need a new resource attribute to accommodate the need for an agent type identifier.

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2022-09-27T19:50:40Z

This distinction is important because presumably opamp is using the "Agent type" language to serve as an identifier for the class of agent. I.e. to distinguish between collectors, SDKs, and perhaps other future agent types.

That is not quite what the problem is.

OpAMP doesn't use "agent type" in any special way. Nothing in OpAMP specifically depends on "agent type" or on service.name attribute specifically.

OpAMP says the following:

Agent's have identifying_attributes. [This is an OpAMP design matter, so appears good to me]
One of the recommended attributes is service.name [This sounds reasonable in Otel world]
The value of service.name used in OpAMP should be equal to the value of the service.name used to report its own telemetry. [Again reasonable to make sure we can correlate between OpAMP and own telemetry]
The recommended value for service.name is FQDN. This part of OpAMP spec is probably the incorrect part. We should not be making any recommendations in OpAMP about what service.name is. It is non of OpAMP business since it likely is already Otel spec's business.

I believe 1-3 are reasonable and nothing wrong with those. Number 4 likely needs to be deleted.

I think we probably need a new resource attribute to accommodate the need for an agent type identifier.

OpAMP doesn't really need it at all for now.

jack-berg · 2022-09-27T20:36:11Z

Thanks for the explanation!

OpAMP doesn't use "agent type" in any special way. Nothing in OpAMP specifically depends on "agent type" or on service.name attribute specifically.

Apologies if this has an obvious answer (I'm still wrapping my head around OpAMP): How would an OpAMP server differentiate between a client which is a collector vs. an SDK? Perhaps the type of client isn't the concern of the protocol and is instead something the operator of the OpAMP server is expected to know ahead of time for a set of identifying attributes?

tigrannajaryan · 2022-09-27T21:47:38Z

How would an OpAMP server differentiate between a client which is a collector vs. an SDK? Perhaps the type of client isn't the concern of the protocol and is instead something the operator of the OpAMP server is expected to know ahead of time for a set of identifying attributes?

Yes, there is no expectation that an OpAMP server implementation will have any hard-coded logic that is based on the "type of the client". The way I envisioned it that on the server the end user can define configs associated with predicates that run on the identifying (and possibly on non-identifying) attributes and the config that matches the predicate is returned to the corresponding clients. So, knowing that Otel Collector uses service.name=otelcol the user will define a Collector config for clients that match that criteria.

Perhaps we need something more here. I am open to suggestions.

tigrannajaryan · 2022-09-27T21:56:14Z

@andykellr I am also curious what you think.

andykellr · 2022-09-27T23:03:36Z

I mostly agree with you, but I think having a convention for agent type is useful. As more agents implement OpAMP, management servers are going to want to show the users information about the agents that are connected, possibly visualizing them with corresponding icons or linking to documentation specific to those agents. Having an arbitrary format for agent type could potentially lead to duplicate names and FQDN avoids that situation.

Should this agent type be service.name or should we introduce a different attribute like service.type to identify the agent type? I'm not sure.

tigrannajaryan · 2022-09-28T00:02:18Z

should we introduce a different attribute like service.type to identify the agent type? I'm not sure.

Probably this. Given that we were not sure what to put in the service.name on the Collector side, this may be a good option to add this to Otel semconv. We can require that it is a reverse FQDN.

jack-berg · 2022-09-30T18:24:04Z

I think service.type is conceptually correct, but maybe not the right name since its not clear that the type is relevant for opamp purposes.

What about something like:

Resource attribute key is opamp.agent.type or service.agent.type. The key is unambiguous in its purpose for opamp and not overloaded with multiple uses.
Possible values are known types of clients that can be configured by the opamp protocol. Currently that would just be collector, but once SDKs are configurable via opamp, we would also add sdk. If someone uses the opamp protocol to remotely manage other agent types, they can specify their own custom value.
Value type is an array of strings, since a particular agent might be configurable as multiple agent types. For example, a collector could have its collector config configured, but the collector will also eventually have the go sdk installed in it, which would be separately configurable with opamp. If an opamp client sends multiple opamp.agent.type values up to a server, the server must choose which type its responses are applicable for.

tigrannajaryan · 2022-09-30T18:39:11Z

Resource attribute key is opamp.agent.type or service.agent.type. The key is unambiguous in its purpose for opamp and not overloaded with multiple uses.

@jack-berg I like the idea of semantic conventions that are specific for OpAMP usage. We still want service.name to be included in the identifying_attributes for the purpose of correlation with own telemetry.
However, nothing prevents us from include additional (non?)identifying attributes in OpAMP protocol itself, which are defined as semantic conventions that are specific to OpAMP. We can define a number of attributes which can then be used for fine or coarse classification on the server-side, e.g.:

service.name=otelcol
service.version=0.40.0
service.instance.id=<some uuid here>
opamp.agent.type=io.opentelemetry.collector
opamp.agent.distro=github.com/signalfx/splunk-otel-collector

Value type is an array of strings, since a particular agent might be configurable as multiple agent types. For example, a collector could have its collector config configured, but the collector will also eventually have the go sdk installed in it, which would be separately configurable with opamp. If an opamp client sends multiple opamp.agent.type values up to a server, the server must choose which type its responses are applicable for.

I wound't want to do this since it creates lots of addressability problems.

Instead, for this use-case the OpAMP client must simply represent 2 different agents: one for the collector, one for the go sdk. The protocol allows this, you can have multiple agents' data transported over one OpAMP connection.

Contributes to open-telemetry#131 service.name is an existing OpenTelemetry convention. OpAMP is not in the business of defining OpenTelemetry's semantic conventions. I deleted the unnecessary wording that tried to add more meaning to the service.name that is not our business to do.

Contributes to #131 service.name is an existing OpenTelemetry convention. OpAMP is not in the business of defining OpenTelemetry's semantic conventions. I deleted the unnecessary wording that tried to add more meaning to the service.name that is not our business to do.

jlegoff · 2022-11-29T19:36:01Z

@tigrannajaryan we discussed this issue during the SIG today. I'd like to go ahead and suggest we make a change in the spec to add the attributes that were already mentioned here as part of the standard identifying attributes for an agent (opamp.agent.type, opamp.agent.distro): this would help distinguish between different types of agent, and, unlike service attributes, would also be applied when agents are not standalone (sdk / language agents).

I think it would make sense to also have standard resource attributes for agents, that would be part of the telemetry.

Another point that was discussed is the agent uid, and how it relates to the lifecycle of the agent. While this is not mentioned in the spec, it looks like folks implementing this protocol try to persist the uid so that it remains unchanged after the agent is restarted. So it looks like we're missing a concept to identify agents that is more stable that a process. Perhaps this concept is the uid - then perhaps the spec could make it clearer and suggest that the uid be stable across restarts.

cc @andykellr @portertech

tigrannajaryan · 2022-11-29T22:56:24Z

I'd like to go ahead and suggest we make a change in the spec to add the attributes that were already mentioned here as part of the standard identifying attributes for an agent (opamp.agent.type, opamp.agent.distro)

I agree, with one important difference: I believe these are non-identifying attributes. Identifying attributes are defined as attributes that are necessary for unique identification of the agent and are included in own metrics of the agent. We should not add arbitrary descriptive attributes to this list just because they are useful. The non-identifying attributes list can be arbitrarily long and has no such restrictions.

I am not sure where exactly we want to define these semantic conventions. It can be in OpAMP spec here in this repo or it can be in Otel's semantic conventions list.

tigrannajaryan · 2022-11-29T22:58:29Z

Another point that was discussed is the agent uid, and how it relates to the lifecycle of the agent. While this is not mentioned in the spec, it looks like folks implementing this protocol try to persist the uid so that it remains unchanged after the agent is restarted. So it looks like we're missing a concept to identify agents that is more stable that a process. Perhaps this concept is the uid - then perhaps the spec could make it clearer and suggest that the uid be stable across restarts.

Persisting the instance id is useful but I think it should not be mandatory. We can add a recommendation that when possible the uid should be persistent. In some environments it may not be possible and I think it is OK if it is ephemeral.

tigrannajaryan · 2023-11-23T02:44:05Z

Submitted this issue to discuss in semconv: open-telemetry/semantic-conventions#554

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 ## Problem Description `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. ## Proposed Change This is a request for comments for adding the following Recommended, experimental Resource semantic conventions: - `service.type` - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Proposed Change =============== This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.

tigrannajaryan · 2024-02-20T21:39:18Z

All, the PR that adds service.type is created, but I and others have doubts that this is the right way. Please comment on the PR with arguments in favour or against it.

Resolves open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 We need a way to record more information about agents than is currently possible using existing semantic conventions. Otel Collector in particular today uses service.name,service.instance.id,service.version attributes to report its own telemetry. These are useful but not sufficient, particularly we are missing the information about which distribution of Otel Collector it is. agent.type/agent.version/agent.id conventions are also aligned with ECS: https://www.elastic.co/guide/en/ecs/current/ecs-agent.html With introduction of this conventions the following attributes change in Otel Collector's own telemetry output: service.name -> agent.type service.version -> agent.version service.instance.id -> agent.id agent.distro will be added as one more property, the equivalent of which did not exist in the past.

jack-berg mentioned this issue Sep 27, 2022

Add service.name resource attribute to the collector's own telemetry open-telemetry/opentelemetry-collector#6136

Closed

tigrannajaryan mentioned this issue Oct 13, 2022

Remove unnecessary recommendation about service.name #135

Merged

jlegoff mentioned this issue Dec 14, 2022

Add agent resource type open-telemetry/semantic-conventions#396

Open

tigrannajaryan mentioned this issue Nov 23, 2023

Request for comments: service.type and service.distro Resource attributes open-telemetry/semantic-conventions#554

Closed

tigrannajaryan mentioned this issue Dec 1, 2023

Add service.type experimental Resource attribute open-telemetry/semantic-conventions#575

Closed

3 tasks

tigrannajaryan mentioned this issue Apr 23, 2024

Add agent semantic conventions open-telemetry/semantic-conventions#950

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opamp spec overloads definition of service.name #131

Opamp spec overloads definition of service.name #131

jack-berg commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

jack-berg commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

andykellr commented Sep 27, 2022

tigrannajaryan commented Sep 28, 2022

jack-berg commented Sep 30, 2022

tigrannajaryan commented Sep 30, 2022

jlegoff commented Nov 29, 2022

tigrannajaryan commented Nov 29, 2022

tigrannajaryan commented Nov 29, 2022 •

edited

Loading

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Feb 20, 2024

Opamp spec overloads definition of service.name #131

Opamp spec overloads definition of service.name #131

Comments

jack-berg commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

jack-berg commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

tigrannajaryan commented Sep 27, 2022

andykellr commented Sep 27, 2022

tigrannajaryan commented Sep 28, 2022

jack-berg commented Sep 30, 2022

tigrannajaryan commented Sep 30, 2022

jlegoff commented Nov 29, 2022

tigrannajaryan commented Nov 29, 2022

tigrannajaryan commented Nov 29, 2022 • edited Loading

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Feb 20, 2024

tigrannajaryan commented Nov 29, 2022 •

edited

Loading