Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JaegerRemoteSampler inconsistent Implementation #5504

Open
rotscher opened this issue Jun 5, 2023 · 6 comments
Open

JaegerRemoteSampler inconsistent Implementation #5504

rotscher opened this issue Jun 5, 2023 · 6 comments
Assignees
Labels
Feature Request Suggest an idea for this project help wanted

Comments

@rotscher
Copy link

rotscher commented Jun 5, 2023

The implementation of SDK Jaeger Remote Sampler Extension is not according to the description of the example of Jaeger Remote Sampler.

{
  "service_strategies": [
    {
      "service": "foobar",
      "type": "ratelimiting",
      "param": 2
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.2,
    "operation_strategies": [
      {
        "operation": "/metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}

Having the remote sampling configuration above I would expect service foobar to be rate limited to 2 and all other services to be sampled with probability 0.2. As an exception operation /metrics of all services (including foobar) never gets sampled.

With the current implementation only rate limiting is configured for service foobar which is also applied to /metrics operation.

Also when operation_strategies are configured for a service then the default type of the service is always "probabilistic" even when "ratelimiting" is configured.

I've used the following libraries and versions:

  • io.opentelemetry:opentelemetry-sdk-extension-autoconfigure:1.26.0-alpha
  • io.opentelemetry:opentelemetry-sdk-extension-jaeger-remote-sampler:1.26.0-alpha

And activated the autoconfiguration with export OTEL_TRACES_SAMPLER=parentbased_jaeger_remote

@rotscher rotscher added the Feature Request Suggest an idea for this project label Jun 5, 2023
@jack-berg
Copy link
Member

@pavolloffay can you comment on this?

@jack-berg jack-berg changed the title inconsistent Implementation JaegerRemoteSampler inconsistent Implementation Jun 27, 2023
@kuujis
Copy link

kuujis commented Mar 9, 2024

@jack-berg @jkwatson - I would like to try solving this if thats ok :)

@pavolloffay
Copy link
Member

correct it should be fixed in OTEL SDK

@kuujis
Copy link

kuujis commented Mar 12, 2024

correct it should be fixed in OTEL SDK

I was debugging this over the weekend and to me it seems the issue is with jaeger not returning probablistic operation sampling strategies if the service configuration is ratelimiting. In theory it would be possible to call Jaeger twice, get strategies for a given service, then get defaults and merge them before starting to use them, but to me it looks like a mistake in Jaeger backend and this "calling jaeger twice" does not feel right.

@kuujis
Copy link

kuujis commented Mar 15, 2024

Just a small update - after updating jaeger to return the operation level default probablistic sampling strategies for ratelimiting service level sampling strategy configurations - the issue can't be reproduced anymore.
So I assume this could be closed once the PR to jaeger gets accepted OR if it does not - I'll make a PR to otel with double jaeger call and merge of the results (I still feel a bit dirty about this option though)

yurishkuro added a commit to jaegertracing/jaeger that referenced this issue Apr 20, 2024
…egies without them (#5277)

## Which problem is this PR solving?
- Part 1 of [#5270]

## Description of the changes
- See issue for description of the bug and of the two changed behaviors
- Default `probabilistic` operation level strategies will now be
returned if service strategy is of `ratelimiting` type
- If the service itself has `probabilistic` strategy, its sampling rate
takes priority for operation-level definition

## How was this change tested?
- Via updated unit test
- By building local all-in-one executable and using it as a otel
instrumentation backend for simple java service
https://github.com/kuujis/opentelemetry-java-5504 and using the below
`sampling_strategies.json` from
open-telemetry/opentelemetry-java#5504
```
{
  "service_strategies": [
    {
      "service": "foobar",
      "type": "ratelimiting",
      "param": 2
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.2,
    "operation_strategies": [
      {
        "operation": "metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}
```

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Kazimieras Pociunas <kazimieras.pociunas@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
@kuujis
Copy link

kuujis commented Apr 20, 2024

This should work out of the box with next release of Jaeger, provided flag sampling.strategies.bugfix-5270 is used when launching it. Since the fix might result in unexpected impact to the Jaeger users - this will become default behavior only in 1.57 version. See jaegertracing/jaeger#5270 for more details.

varshith257 pushed a commit to varshith257/jaeger that referenced this issue May 3, 2024
…egies without them (jaegertracing#5277)

## Which problem is this PR solving?
- Part 1 of [jaegertracing#5270]

## Description of the changes
- See issue for description of the bug and of the two changed behaviors
- Default `probabilistic` operation level strategies will now be
returned if service strategy is of `ratelimiting` type
- If the service itself has `probabilistic` strategy, its sampling rate
takes priority for operation-level definition

## How was this change tested?
- Via updated unit test
- By building local all-in-one executable and using it as a otel
instrumentation backend for simple java service
https://github.com/kuujis/opentelemetry-java-5504 and using the below
`sampling_strategies.json` from
open-telemetry/opentelemetry-java#5504
```
{
  "service_strategies": [
    {
      "service": "foobar",
      "type": "ratelimiting",
      "param": 2
    }
  ],
  "default_strategy": {
    "type": "probabilistic",
    "param": 0.2,
    "operation_strategies": [
      {
        "operation": "metrics",
        "type": "probabilistic",
        "param": 0.0
      }
    ]
  }
}
```

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Kazimieras Pociunas <kazimieras.pociunas@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Vamshi Maskuri <gwcchintu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Suggest an idea for this project help wanted
Projects
None yet
Development

No branches or pull requests

5 participants