Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace Payload Collection #234

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

nirga
Copy link

@nirga nirga commented Jun 19, 2023

Continuing the work started by @ronyis at #219.

We're proposing semantic attributes for reporting payloads (of HTTP requests, DB queries, etc.) as part of OpenTelemetry. This should be optional for users who wish to use it and can provide consumers of OpenTelemetry with a standard option for getting payloads out of trace instrumentation.

Previous instrumentation libraries (pre-otel) have successfully implemented it so I think this is something that will be welcomed by the community.

Would love to get your feedback and work on getting this accepted.

@ronyis ronyis mentioned this pull request Jun 21, 2023
@nirga nirga requested a review from kenfinnigan June 21, 2023 08:48
(for example, `http.request.body`):

- **Data attribute**: `<attribute>.content`. Holds the decoded content of the payload data. Alternatives: `.data`, `.payload`.
- **Size attribute**: `<attribute>.size`. Holds the number of bytes of the encoded payload data. Alternatives: `.length`, `.bytes`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use protocol buffers, the size/length of content is implicitly recorded.

I believe even for JSON this is effectively true (you'll have a length/size available at runtime) so no need to encode it separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to keep the size as a separate attribute, since we may have an actual different content size (for example if we truncated part of it).

- **Size attribute**: `<attribute>.size`. Holds the number of bytes of the encoded payload data. Alternatives: `.length`, `.bytes`.
- **Encoding attribute**: `<attribute>.encoding`. Holds the original attribute encoding type.
Predefined values should be declared (though users may decide using custom values as well).
For example - `json`, `protobuf`, `avro`, `utf-8`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to allow "versioning" of these protocols?

What's the path for adding new (custom) encodings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is only the encoding originally initially when sending the payload. The content structure inside the span should always be the same, regardless of the protocol. This is actually another reason why we should translate the payload into some common structure as proposed here.

#### `Payload` class

We propose adding a class that will provide the related functionality for handling payload data.
Both traces and logs could use instances of this class, which will assist sharing related functionality.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this was added in real tracing products, can you link to the APIs/SDKs they offered?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit different, since they just offered this instrumentation OOTB, so the API for the payload was hidden from the developer. See for example Epsagon's instrumentation for FastAPI (The exact structure of the data is seen here)

)
```

### Payload attributes in Logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to resolving this discussion as a priority.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But should it affect the attribute names for spans as proposed here?


Representing Null values in payload contents is required, as it part of JSON and many other formats
like Protobuf and Avro. Therefore we propose supporting it by spans and logs attributes APIs (while
in OTel proto schema it is already supported).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this supported in OTel proto schema? Do you mean the "AnyValue" being allowed to be "emtpy"?

I'd like more details about where/how you need Null support here. Are you trying to record payload attributes even if the payload is null? Could you just write an empty byte array for the Proto/Avro case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Hmm, this can work. Updated this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that we need to support JSON payloads like '{"x": null}' and also null, and similar examples in other formats.
If the payload is encoded with the "AnyValue" type, it can be achieved by using empty objects. Though it's still needed to be supported by the APIs.
In case we add a dedicated proto object, like the Value suggested in the alternative, we should also make sure it supports Null values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need a lot more justification on a new AnyValue type in OTLP before I'd be convinced. I think we already support the JSON payloads you mention.

Capturing payloads using appropriate APIs could assist in specifying different limits in the future (either
more are less strict than of 'regular' attributes), and mechanisms for shortening large payloads as well.

### Handling sensitive data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out that payload collection likely requires some kind of flag to signal that sensitive data collection is ok.

Specifically - I think this proposal needs some kind of notion on a per-trace basis that denotes whether sensitive collection is "allowed" for that trace.

E.g. you can imagine a system where I issue an RPC with a special flag denoting that collection of payloads (and other sensitive data) is ok, where this would be turned on.

You can also imagine production systems that require user-consent before this data could be collected in a o11y system. We should absolutely have a mechanism to check (dynamically, per-request / per-trace) whether this data can be collected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this might be out of scope for this as it should be resolved on the API level of the different SDKs. Here I suggest something much simpler - if we ever to auto-instrument collection of payloads, the default should always be off where users who decide to turn it on should know it will be turned on for everything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in favor of having this kind of flag regarding sensitive data collection. Though I don't think it should be a requirement for the changes proposed in this OTEP.
At this point, users already collect sensitive data using the existing APIs and attributes.

@mishushakov
Copy link

Would love to see that happen at Step CI 😉

@axlrate
Copy link

axlrate commented Mar 12, 2024

Should the payload be a span event instead of a span attribute? Most backends have higher size limits for span events.

@mmanciop
Copy link

Should the payload be a span event instead of a span attribute? Most backends have higher size limits for span events.

Lumigo does this with span attributes, and the experience was that a truncation to 1024, 2048 or 4096 or power thereof was often good enough. (The complex bit was performing truncation that would keep validity of the data format, e.g., JSON.) Users have means via SDK configuration to change the attribute truncation threshold. I think setting this data as span events would be very unintuitive in terms of data model. I don't think we should use an unintuitive data model to work around limitations of backends. Moreover, for an important use-case like this, I can imagine backends makings exceptions for the specific keys we'd use.

@codefromthecrypt
Copy link

might be worthwhile to ping folks who did some things in this space, even if moved on since.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants