Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow de-duplicating Span.Process, agent, and collector tags #1778

Closed
jpkrohling opened this issue Sep 6, 2019 · 17 comments · Fixed by #2658
Closed

Allow de-duplicating Span.Process, agent, and collector tags #1778

jpkrohling opened this issue Sep 6, 2019 · 17 comments · Fixed by #2658
Assignees
Labels
good first issue Good for beginners

Comments

@jpkrohling
Copy link
Contributor

jpkrohling commented Sep 6, 2019

Problem - what in Jaeger blocks you from solving the requirement?

When using --jaeger-tags in the Agent, and the same tags as JAEGER_TAGS on a client such as the Java client, the tags are being duplicated:

image

This is probably the relevant code:

for _, span := range spans {
if span.Process != nil {
span.Process.Tags = append(span.Process.Tags, agentTags...)
}
}

Proposal - what do you suggest to solve the problem or improve the existing situation?

Either:

  • Allow agents to override the client's process tag, or
  • Prevent the agent from adding a process tag to a span's process tag if a tag with the same name exists

Any open questions to address

  • Decide which of the proposals to implement
@annanay25
Copy link
Member

Aha, good catch!

I'd vote for the second option (prevent agent from adding tag if it already exists), because I can't think of a scenario where agent would have a more accurate value for a tag than a client.

P.S: This can also be marked "good first issue"!

@objectiser
Copy link
Contributor

The counter argument is that you want the agent to enforce an official/authoritative value and prevent the client from being able to override it.

Possibly we need a mechanism to enable the agent to know which action to take when a duplicate is found.

@jpkrohling
Copy link
Contributor Author

So:

  1. provide a flag --duplicate-tags, with the possible values client (keep client's value), agent (keep agent's value), and duplicate (the current behavior).
  2. by default, client is used (or duplicate, if we want to be extra careful in case someone depends on the current behavior)

@objectiser
Copy link
Contributor

Sounds good as long as we believe all tags are to be treated the same.

@jpkrohling jpkrohling added the good first issue Good for beginners label Sep 10, 2019
@jpkrohling
Copy link
Contributor Author

Marking this as "good first issue". I'll implement this in a couple of weeks if we don't get volunteers :)

@Pothulapati
Copy link

I can take this issue :D

@dm03514
Copy link

dm03514 commented Sep 10, 2019

If something comes up @Pothulapati , I’d be happy to take it too :)

@jpkrohling
Copy link
Contributor Author

@dm03514 we have plenty of good first issues. How about this one? jaegertracing/jaeger-operator#654

@yurishkuro
Copy link
Member

What is the problem with having duplicate tags? What is the business scenario where agent must override tags provided by the client?

The original intention of agent tags was to provide additional dimensions that may not be known to the client, like which zone/cluster the code is running in.

@objectiser
Copy link
Contributor

The original intention of agent tags was to provide additional dimensions that may not be known to the client, like which zone/cluster the code is running in.

The usecase is really to ensure the client cannot override/spoof the values associated with those tags, in cases where they may be relied upon for subsequent security/post processing.

@jpkrohling
Copy link
Contributor Author

Besides @objectiser's case, agents might be automatically added to pods (like, via the operator) and they have no way to know whether the tracer is already being configured with the same information on its own (pod name and namespace, for instance). If the agent and the client report the same information, we end up with the same information being duplicated.

Having duplicate information is unnecessary noise and storage, and it's probably not what the user expects in most of the cases.

@yurishkuro yurishkuro changed the title Batch process + Span process = duplicated process Allow de-duplicating tags between Span.Process, agent tags, and collector tags Oct 10, 2019
@yurishkuro yurishkuro changed the title Allow de-duplicating tags between Span.Process, agent tags, and collector tags Allow de-duplicating tags Span.Process, agent, and collector tags Oct 10, 2019
@yurishkuro yurishkuro changed the title Allow de-duplicating tags Span.Process, agent, and collector tags Allow de-duplicating Span.Process, agent, and collector tags Oct 10, 2019
@jpkrohling
Copy link
Contributor Author

I could swear we had this fixed already :-) Any volunteers for this one?

@fbartnitzek
Copy link

I tried the --collector.tags configuration, but the tag-duplicates are multiplying fast even if you only test it with the jaeger-query.
Without a way to prevent duplicates, by enforcing that client / agent / collector key wins, those tags produce too much noise to use them.

@jpkrohling
Copy link
Contributor Author

but the tag-duplicates are multiplying fast

Do you have an example of that?

@fbartnitzek
Copy link

sure:
My test was on kubernetes, just with jaeger-collector and jaeger-query (v1.18.1, without operator).
Only the jaeger-query-agent-sidecar produces traces and the collector writes them to a separate elasticsearch index with this collector-tags

spec:
  containers:
  - args:
    - --collector.tags=testtag=testvalue

The jaeger-query traces showed

  • 2 testtag-labels for the first trace on path /api/services
  • after a page refresh it showed 4 testtag-labels on path /api/services

after some clicking in the ui, I also found

  • for path /api/services/{service}/operations traces with 2 and traces with 4 testtag-labels
  • for path /api/traces traces with 5 and traces with 9 testtag-labels

And all spans of a trace have the same number of testtag-labels.

image

@jpkrohling
Copy link
Contributor Author

This is really, really odd. We should indeed work on de-duplicating the tags, but there has to be something else going on...

@Betula-L
Copy link
Contributor

@jpkrohling I can take this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for beginners
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants