Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed tracing for event-bus-kafka and Datadog #758

Open
timmc-edx opened this issue Aug 6, 2024 · 2 comments
Open

Add distributed tracing for event-bus-kafka and Datadog #758

timmc-edx opened this issue Aug 6, 2024 · 2 comments

Comments

@timmc-edx
Copy link
Member

Datadog does not automatically connect the event bus's producer and consumer traces. If we want this sort of distributed tracing, we'll need to add it ourselves.

Implementation notes

Datadog Support confirms that there is no automatic support for connecting the producer's trace to the spans that come out of the consumer's work. However, we can implement this ourselves if we need it. It's not clear what we get automatically if we enable DD_KAFKA_PROPAGATION_ENABLED and what we get additionally from the custom code they provide as an example (which would be split between edx-django-utils and event-bus-kafka).

Confirming that the functionality difference you've described between NR and DD currently does not exist for us OOTB, and would require some custom code to implement. One of our engineering folks provided this example, using the ddtrace propagator class, and using a manual span to house any post-message processing:

from ddtrace import tracer, config
from ddtrace.propagation.http import HTTPPropagator as Propagator

msg = consumer.poll()

ctx = None
if msg is not None and msg.headers():
    # Extract the distributed context from message headers
    ctx = Propagator.extract(dict(msg.headers()))
with tracer.start_span(
    name="kafka-message-processing", # or whatever name they want from the manual span
    service="their service name", # match their main service name
    child_of=ctx if ctx is not None else tracer.context_provider.active(),
    activate=True
):
    # do any db or other operations that you want included in the distributed context
    db.execute()

One important note here: You'll want to ensure for both producer and consumer services, the following environment variable has been set: DD_KAFKA_PROPAGATION_ENABLED=true. Using this, the trace should include both producer and consumer spans as well as later operation spans.

(It would probably be more appropriate for us to use Span Links but those are only available via the OpenTelemetry integration.)

@robrap
Copy link
Contributor

robrap commented Aug 7, 2024

I'm also curious how this relates to DD Data Streaming?

@robrap
Copy link
Contributor

robrap commented Aug 7, 2024

Marking P4, but unsure if this should be P3, and unsure if the Produce calls are missing information that would be useful to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

2 participants