Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Cloudwatch Exporter: Logs not associated with X-Ray traces #24634

Closed
dhilgarth opened this issue Jul 27, 2023 · 11 comments
Closed

AWS Cloudwatch Exporter: Logs not associated with X-Ray traces #24634

dhilgarth opened this issue Jul 27, 2023 · 11 comments
Labels

Comments

@dhilgarth
Copy link

dhilgarth commented Jul 27, 2023

Component(s)

exporter/awscloudwatchlogs, exporter/awsxray

What happened?

Description

Logs exported with awscloudwatchlogs are not associated with their respective traces exported via awsxray.

Steps to Reproduce

Create a span and while it's active, create a log entry. Export both to a local collector sidecar instance via OTLP exporter in the application. This local collector sidecar is configured to export logs with awscloudwatchlogs and traces via loadbalancing - via OTLP - to a gateway collector which in turn exports the traces via awsxray.

Expected Result

When I open a trace in AWS X-Ray, I expect it to show the associated logs.

Actual Result

It doesn't show the logs in the Logs field of the trace. I can view the logs 'manually' by going to the log group and stream where they were exported to by awscloudwatchlogs

When I click on "Show in Cloudwatch Insights" I notice two things:

  1. It's filtering for messages to contain this text: "1-<first 8 hex digits from trace ID>-<remaining 24 hex digits from trace ID>". This text is not in the log entry, the log entry just contains the trace_id in the otel format.
  2. In that opened Cloudwatch Insights view, I need to select the log groups to search in -> Do we somehow need to configure this relationship? Or would it just work if point 1 was fixed?

I've tried to fix point 1 via transform, but it doesn't seem to be possible as trace_id isn't a string and I found no way to convert it to a hex representation of the underlying byte array.

Collector version

0.81.0

Environment information

Environment

AWS ECS Fargate

OpenTelemetry Collector configuration

sidecar collector:

extensions:
  health_check: {}
receivers:
  otlp:
    protocols:
      grpc:
      http:
processors:
  batch:
  # Will convert all monotonic, cumulative sums to monotonic, delta sums
  cumulativetodelta:
  attributes:
    actions:
      - key: 
  transform:
    trace_statements:
      - context: span
        statements:
        - truncate_all(attributes, 4095)
        - truncate_all(resource.attributes, 4095)
    log_statements:
      - context: log
        statements:
        - truncate_all(attributes, 4095)
        - truncate_all(resource.attributes, 4095)
#        - set(attributes["xray_id"], Concat(["1", Substring(trace_id, 0, 8), "-", Substring(trace_id, 8, 24)], ""))
exporters:
  awscloudwatchlogs:
    log_group_name: $LOG_GROUP_NAME
    log_stream_name: $LOGS_LOG_STREAM_NAME
  awsemf:
    log_group_name: $LOG_GROUP_NAME
    log_stream_name: $METRICS_LOG_STREAM_NAME
  loadbalancing:
    routing_key: "traceID"
    protocol:
      otlp:
        tls:
          insecure: true
    resolver:
      dns:
        hostname: $ENDPOINT_DNS_NAME
service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [cumulativetodelta, batch]
      exporters: [awsemf]
    traces:
      receivers: [otlp]
      processors: [transform, batch]
      exporters: [loadbalancing]
    logs:
      receivers: [otlp]
      processors: [transform, batch]
      exporters: [awscloudwatchlogs]


Gateway collector:

extensions:
  health_check: {}
receivers:
  otlp:
    protocols:
      grpc:
      http:
        cors:
          allowed_origins: ["*"]
          allowed_headers: ["*"]
processors:
  batch:
  # Will convert all monotonic, cumulative sums to monotonic, delta sums
  cumulativetodelta:
  transform:
    trace_statements:
      - context: span
        statements:
        - truncate_all(attributes, 4095)
        - truncate_all(resource.attributes, 4095)
    log_statements:
      - context: log
        statements:
        - truncate_all(attributes, 4095)
        - truncate_all(resource.attributes, 4095)
  tail_sampling: # https://opentelemetry.io/blog/2022/tail-sampling/
    policies:
      [
        {
          name: errors-policy,
          type: status_code,
          status_code: { status_codes: [ERROR] }
        },
        {
          name: subset-of-successful-requests-policy,
          type: and,
          and: {
            and_sub_policy: 
            [
              {
                name: successful-requests-policy,
                type: status_code,
                status_code: { status_codes: [OK, UNSET] }
              },
              {
                name: subset-policy,
                type: probabilistic,
                probabilistic: {sampling_percentage: $SAMPLING_PERCENTAGE_SUCCESSFUL_REQUESTS}
              }
            ]
          }
        }
      ]
exporters:
  otlp:
    endpoint: $ENDPOINT
    headers:
      ${api_key_header_name}: $API_KEY
  awsxray:
service:
  telemetry:
    logs:
      level: $COLLECTOR_LOG_VERBOSITY
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, transform, batch]
      exporters: [otlp, awsxray]

Log output

No response

Additional context

No response

@dhilgarth dhilgarth added bug Something isn't working needs triage New item requiring triage labels Jul 27, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@DewaldDeJager
Copy link
Contributor

It's filtering for messages to contain this text: "1-<first 8 hex digits from trace ID>-<remaining 24 hex digits from trace ID>". This text is not in the log entry, the log entry just contains the trace_id in the otel format.

This is the reason that the traces and logs can't be correlated. OpenTelemetry and X-Ray use different trace formats. The CoPilot docs on observability mention this and suggest that the application transforms the trace ID before it is logged (though this is probably not possible with auto-instrumentation).

If you use the awslogs log driver to ship the logs to CloudWatch instead of the OTel Collector, then the resource detector processor can be added to the sidecar collector's traces pipeline so that the log group and log stream are automatically added to the traces. This makes it a little easier to correlate the two manually.

It's possible to add optional functionality to the CloudWatch Logs exporter to transform the trace ID into an X-Ray trace ID format but it brings up some questions. Should the exporter be transforming the data rather than a processor? Should the data be transformed in-place or added as a new field? The logs transformer processor is currently in development and should be able to do this type of transformation.

@dhilgarth
Copy link
Author

dhilgarth commented Jul 27, 2023

My thoughts:

Processors should be for domain specific transformations. This here, is a technical transformation, related to the exporter, so I would put it into the exporter, similar to how the awsxray exporter changes the OTEL spans into segments.
Another reason for why it belongs into the exporter and not a processor: I could have multiple exporters, one for AWS Cloudwatch and another normal otlp exporter. The otlp exporter wouldn't care about the AWS X-Ray trace ID format.
Finally, the expectation if I use awsxray, awscloudwatchlogs and awsemf is that they play nice together, as these are different aspects of the same AWS observability suite.

Because the transformation of the trace ID should happen in the exporter, it could happen inline, although I would probably still leave the original field alone and add a new one, called xray_traceid or so

@dluongiop
Copy link

I agree with you. We are facing the same problem as we are also using the xray exporter and the cloudwatch log exporter.

I think the reason they don't play nice is because the xray exporter was developed by Amazon where as Cloudwatch log exporter was not and is not distributed in AWS OTel distro. Anyway, it would be nice for them to collaborate to align better.

@dluongiop
Copy link

@dhilgarth

You may have already discovered this but in case this helps you.

I managed to use a processor-transform to create a new "trace_id_xray" attribute using the string format of the trace_id based on the configuration file you shared. See relevant snippet below.

Once I have the xray format of the trace id in the cloudwatch log, it is picked up and correlated with the xray trace.

processors:
  transform:
    log_statements:
      - context: log
        statements:
          - set(attributes["trace_id_xray"], Concat(["1-", Substring(trace_id.string, 0, 8), "-", Substring(trace_id.string, 8, 24)], ""))

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bryan-aguilar
Copy link
Contributor

Hi all,

There are quite a few things that happen behind the scenes for trace to log correlation to happen correctly. I cannot speak to them all definitely but from what I understand the trace must have reference to the log group name in it's metadata and the cloudwatch log must have the traceid injected into it with this format AWS-XRAY-TRACE-ID: 1-5d77f256-19f12e4eaa02e3f76c78f46a@1ce7df03252d99e1. I am going to try to find some more definitely documentation to provide but this issue has some background info on what is required for both services.

Also, X-Ray no longer requires the timestamp in the trace id :) https://aws.amazon.com/about-aws/whats-new/2023/10/aws-x-ray-w3c-format-trace-ids-distributed-tracing/

@bryan-aguilar bryan-aguilar removed the needs triage New item requiring triage label Nov 9, 2023
@github-actions github-actions bot removed the Stale label Nov 10, 2023
@dhilgarth
Copy link
Author

Also, X-Ray no longer requires the timestamp in the trace id :) https://aws.amazon.com/about-aws/whats-new/2023/10/aws-x-ray-w3c-format-trace-ids-distributed-tracing/

That's awesome news

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 10, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 10, 2024
@AmrYousef
Copy link

Any updates here, I'm still facing this issue although i thought AWS cloudwatch and xray now support opentelemetry, Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants