Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add redact for span name #32962

Closed

Conversation

ruimhu1201
Copy link

Description: Added redact for span name in redactprocessor
redaction processor is redact attribute but the span name is not redact. When we depends on this function to do credential redaction, the span name could still be the issue, therefore, add redaction on span name,

Testing: Test case is added to the unit test.

@ruimhu1201 ruimhu1201 requested a review from a team May 9, 2024 11:43
Copy link

linux-foundation-easycla bot commented May 9, 2024

CLA Not Signed

@github-actions github-actions bot added the processor/redaction Redaction processor label May 9, 2024
Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @ruimhu1201! Please make sure to sign the CLA and add a changelog for this change.

Can you give more details regarding the use case? I'm not sure I understand why a span name would include sensitive information

@ruimhu1201
Copy link
Author

Thanks for the contribution @ruimhu1201! Please make sure to sign the CLA and add a changelog for this change.

Can you give more details regarding the use case? I'm not sure I understand why a span name would include sensitive information

Thank you for reviewing, @codeboten. Here is an example: In the HTTP URL, some customers may utilize a JWT token as a username or group name, which forms part of the path. Although we have implemented a redaction pattern for all keys in the configuration file, and attributes have been successfully redacted, the name still inadvertently reveals the token.
image

@svrnm
Copy link
Member

svrnm commented May 10, 2024

Although I agree that such a redaction may be useful, note that the token should not end up in the span name (see here and here at http.route), what HTTP library are you using here? This might be an implementation error.

@TylerHelmuth
Copy link
Member

As an alternative, for unexpected situations like this, the transformprocessor can be used to modify span names.

The OTTL roadmap states that the redactionprocessor fills a significant role and that the transformprocessor is not the collector's solution for redaction, but I think limiting it's scope to redacting attributes is appropriate.

@ruimhu1201
Copy link
Author

Although I agree that such a redaction may be useful, note that the token should not end up in the span name (see here and here at http.route), what HTTP library are you using here? This might be an implementation error.

I totally understood that use token as part of the path is not a good practice, but it's the group and username which is input from customer and we don't have control over.

@ruimhu1201
Copy link
Author

As an alternative, for unexpected situations like this, the transformprocessor can be used to modify span names.

The OTTL roadmap states that the redactionprocessor fills a significant role and that the transformprocessor is not the collector's solution for redaction, but I think limiting it's scope to redacting attributes is appropriate.

there's no harm for redactprocessor to provide full ability to redact all the pattern even with the span name, do you think it's better to have a flag which can select if custoemr need to redact name or not?

@cartermp
Copy link
Contributor

cartermp commented May 13, 2024

we don't have control over

Can you clarify how you don't have control over how this data ends up on the span name? Customers can always manually name spans whatever they want -- social security numbers, credit card numbers, auth tokens, etc. -- but that's squarely in the misuse category. If we should enable filtering data from misuse, that's a separate discussion IMO.

@ruimhu1201
Copy link
Author

we don't have control over

Can you clarify how you don't have control over how this data ends up on the span name? Customers can always manually name spans whatever they way -- social security numbers, credit card numbers, auth tokens, etc. -- but that's squarely in the misuse category. If we should enable filtering data from misuse, that's a separate discussion IMO.

I understand your concerns and appreciate the opportunity to clarify. As a platform offering real-time services, we empower our clients by allowing them to customize their code integration, which includes specifying URL paths. While we recognize that this flexibility can lead to misuse, it is beyond our purview to dictate customer practices. Nonetheless, we adhere to Microsoft's stringent Azure service guidelines to prevent any inadvertent credential exposure. It is our duty to redact such sensitive information from logs, ensuring compliance with these standards and the protection of our customers' data. We take this responsibility seriously and are committed to upholding the highest security measures.

@TylerHelmuth
Copy link
Member

@ruimhu1201 your PR raises much larger questions around the processor/collector and security, and I'd prefer we approach the topic holistically instead of in an isolated way.

A decision needs to be made (in a new issue, not in this PR), about the future of the redactionprocessor and its purpose in the OpenTelemetry ecosystem. Questions that need answered:

  1. What is the the redactionprocessor responsibility within the OpenTelemetry ecosystem
  2. On what signals should the redaction processor operate
  3. On what fields within the OTLP payload should the redaction operator interact
  4. How configurable should the redactionprocessor be
    • Are there fields that are automatically redacted when the processor is used but others are opt-in?
  5. Should redactions be allow to be conditional

Once decisions are made for those types of questions we need to make implementation decisions like:

  1. Should the above be handled solely by the transformprocessor
  2. Should the redactionprocessor utilize OTTL or implement its own configuration for expressing what to redact

There are probably more decisions to make that will arise once the discussion starts. As a code owner for the redaction processor I do not feel comfortable adding support for other fields until this discussion has happened as it will lead, as we saw in the filterprocessor, to a mismatch of rules/features between signals. I will also admit that I have not been prioritizing this processor, and will struggle to do so while trying to move the Collector to 1.0. We would love to collaborate with you if you and your organization sees this as a priority.

In the meantime the transformprocessor can act as a replacement for the redactionprocessor for non-attribute fields since it allows transforming any field on the payload.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 31, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/redaction Redaction processor Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants