Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing traces can't be opened from the trace id lookup UI (404 error received) - ES backend #3270

Closed
cspwizard opened this issue Sep 16, 2021 · 3 comments · Fixed by #3299

Comments

@cspwizard
Copy link

Describe the bug
Trace id's cannot be opened through the trace lookup UI (the search bar at the top) the API 404 page is shown.
However I can find open the same trace from search UI.

To Reproduce
Steps to reproduce the behavior:

  1. set up Jaeger with Elasticsearch and indexes rollover (either via utilities of ILM) and put some traces
  2. search for a trace and open it
  3. hit refresh (F5)
  4. 404 Not Found displayed
  5. same 404 behavior in case of opening the trace via lookup

Expected behavior
Existing trace should be opened

Screenshots
image
image
image

Version (please complete the following information):

  • OS: Linux
  • Jaeger version: 1.26
  • Deployment: Kubernetes

What troubleshooting steps did you try?
Verified the indexes and policies setup, Jaeger configuration, verified the presence of the trace in question

Additional context
Span and service indexes created with name jaeger-span-000001 and jaeger-service-000001 with aliases jaeger-span-read, jaeger-span-write and jaeger-service-read, jaeger-service-write. The issue is reproducible right away after fresh setup.
Both collector and UI are started with arguments:

"--es.use-aliases=true",
"--es.num-replicas=0",
"--es.num-shards=1",
"--es.create-index-templates=false",
"--es.log-level=error",
"--es.use-ilm=true",
"--es.max-span-age=168h0m0s",

I've take a look into the code, and I think the issue is in func (s *SpanReader) multiRead(ctx context.Context, traceIDs []model.TraceID, startTime, endTime time.Time) ([]*model.Trace, error) which is called from api/trace/{id}. In case of using aliases it should make a query to the read alias jaeger-span-read in the case, and not trying to construct the multi index search call which will fail due to different index naming not the expected date formatted but - 000001, 000002, etc. However I'm not a golang developer and I may miss something.

In case if any other information will be required I'll try to provide it asap.

@cspwizard cspwizard added the bug label Sep 16, 2021
@cspwizard cspwizard changed the title Existing traces can't be opened from the trace id lookup UI (404 error received) Existing traces can't be opened from the trace id lookup UI (404 error received) - ES backend Sep 16, 2021
@pavolloffay
Copy link
Member

Could you please increase es.max-span-age flag to match the oldest index creation time? The oldest index that is in the read alias.

@oliversalzburg
Copy link

I feel like I'm suffering from the same issue. For a few days now, I've repeatedly looked up trace IDs in our Jaeger and they were not found:

{"data":null,"total":0,"limit":0,"offset":0,"errors":[{"code":404,"msg":"trace not found"}]}

I was just looking for trace 42e188da40198293 and it wasn't found. Then I checked a search to see which traces were even recorded at all. The trace in question is found:
image

When I click on the trace, I get the HTTP Error: trace not found error. However, this is not true 100% of the time. When I randomly try to open the same trace again, sometimes the trace will actually open:
image

The best I could come up with as a reason would be that maybe I messed up the ES index rollover setup, but I don't see a direct connection.

Please let me know if I should open a new issue for this and what I can provide to help.

@cspwizard
Copy link
Author

cspwizard commented Sep 29, 2021

Could you please increase es.max-span-age flag to match the oldest index creation time? The oldest index that is in the read alias.

it's already 168h (7days), after 7days the index will be removed by ILM policy. And as I mentioned it is reproducible right after fresh deploy (no pre-existing indexes and so on).
Existing indexes:
image

ctreatma added a commit to ctreatma/jaeger that referenced this issue Sep 30, 2021
…Unix epoch

Version 1.26 introduced an automatic configuration for the query lookback
when using ElasticSearch with aliases enabled.  When aliases are enabled,
the ES plugin will look back 100 years.  This pre-dates the Unix epoch, and
while such dates can be modeled as negative timestamps, the model defined
in `jaeger/model/time.go` only supports unsigned timestamps.  As a result,
the 100-year lookback ends up overflowing the time model, resulting in a
distant-future lookback date, rather than a distant-past lookback date.

While the time model could be updated to support negative timestamps, it
seems unlikely that any Jaeger users would reasonably need to search for
spans from the 1920s.  This reduces the automatic lookback to 50 years to
remove the overflow issue while still providing an extremely long search
window that should serve even the most ambitious searches of historical
trace data.
ctreatma added a commit to ctreatma/jaeger that referenced this issue Sep 30, 2021
…Unix epoch

Version 1.26 introduced an automatic configuration for the query lookback
when using ElasticSearch with aliases enabled.  When aliases are enabled,
the ES plugin will look back 100 years.  This pre-dates the Unix epoch, and
while such dates can be modeled as negative timestamps, the model defined
in `jaeger/model/time.go` only supports unsigned timestamps.  As a result,
the 100-year lookback ends up overflowing the time model, resulting in a
distant-future lookback date, rather than a distant-past lookback date.

While the time model could be updated to support negative timestamps, it
seems unlikely that any Jaeger users would reasonably need to search for
spans from the 1920s.  This reduces the automatic lookback to 50 years to
remove the overflow issue while still providing an extremely long search
window that should serve even the most ambitious searches of historical
trace data.

Signed-off-by: Charles Treatman <charles_treatman@comcast.com>
pavolloffay pushed a commit that referenced this issue Oct 4, 2021
…3299)

* Close #3270: Prevent rollover lookback from passing the Unix epoch

Version 1.26 introduced an automatic configuration for the query lookback
when using ElasticSearch with aliases enabled.  When aliases are enabled,
the ES plugin will look back 100 years.  This pre-dates the Unix epoch, and
while such dates can be modeled as negative timestamps, the model defined
in `jaeger/model/time.go` only supports unsigned timestamps.  As a result,
the 100-year lookback ends up overflowing the time model, resulting in a
distant-future lookback date, rather than a distant-past lookback date.

While the time model could be updated to support negative timestamps, it
seems unlikely that any Jaeger users would reasonably need to search for
spans from the 1920s.  This reduces the automatic lookback to 50 years to
remove the overflow issue while still providing an extremely long search
window that should serve even the most ambitious searches of historical
trace data.

Signed-off-by: Charles Treatman <charles_treatman@comcast.com>

* Update test for maxSpanAge when aliases are enabled

Signed-off-by: Charles Treatman <charles_treatman@comcast.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants