Skip to content

Commit

Permalink
Remove the description of esCleaner.py from plugin/storage/es/README.…
Browse files Browse the repository at this point in the history
…md (#5891)

esCleaner.py no longer exists, but README.md still has the relevant
content, so I've made some adjustments.

---------

Signed-off-by: 王然 <ranwang@alauda.io>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
  • Loading branch information
chinaran and yurishkuro authored Aug 27, 2024
1 parent 2b9a9b8 commit 0339b4b
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 17 deletions.
17 changes: 17 additions & 0 deletions cmd/es-index-cleaner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# es-index-cleaner

It is common to only keep observability data for a limited time.
However, Elasticsearch does no support expiring of old data via TTL.
To help with this task, `es-index-cleaner` can be used to purge
old Jaeger indices. For example, to delete indixes older than 14 days:

```
docker run -it --rm --net=host -e ROLLOVER=true \
jaegertracing/jaeger-es-index-cleaner:latest \
14 \
http://localhost:9200
```

Another alternative is to use [Elasticsearch Curator][curator].

[curator]: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html
25 changes: 8 additions & 17 deletions plugin/storage/es/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,24 @@ This provides a storage backend for Jaeger using [Elasticsearch](https://www.ela
## Indices
Indices will be created depending on the spans timestamp. i.e., a span with
a timestamp on 2017/04/21 will be stored in an index named `jaeger-2017-04-21`.
ElasticSearch also has no support for TTL, so there exists a script `./esCleaner.py`
that deletes older indices automatically. The [Elastic Curator](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html)
can also be used instead to do a similar job.

### Using `./esCleaner.py`
The script is using `python3`. All dependencies can be installed with: `python3 -m pip install elasticsearch elasticsearch-curator`.

Parameters:
* Environment variable TIMEOUT that sets the timeout in seconds for indices deletion (default: 120)
* Optional environment variable ES_USERNAME and ES_PASSWORD
* a number that will delete any indices older than that number in days
* ElasticSearch hostnames
* Example usage: `TIMEOUT=120 ./esCleaner.py 4 localhost:9200`
It is common to only keep observability data for a limited time.
However, Elasticsearch does no support expiring of old data via TTL.
To purge old Jaeger indices, use [jaeger-es-index-cleaner](../../../cmd/es-index-cleaner/).

### Timestamps
Because ElasticSearch's `Date` datatype has only millisecond granularity and Jaeger
requires microsecond granularity, Jaeger spans' `StartTime` is saved as a long type.
The conversion is done automatically.

### Nested fields (tags)
`Tags` are [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) fields in the
`Tags` are [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) fields in the
ElasticSearch schema used for Jaeger. This allows for better search capabilities and data retention. However, because
ElasticSearch creates a new document for every nested field, there is currently a limit of 50 nested fields per document.

### Shards and Replicas
Number of shards and replicas per index can be specified as parameters to the writer and/or through configs under
`./pkg/es/config/config.go`. If not specified, it defaults to ElasticSearch defaults: 5 shards and 1 replica.
Number of shards and replicas per index can be specified as parameters to the writer and/or through configs under
`./pkg/es/config/config.go`. If not specified, it defaults to ElasticSearch defaults: 5 shards and 1 replica.
[This article](https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index) goes into more information
about choosing how many shards should be chosen for optimization.

Expand All @@ -42,7 +33,7 @@ This plugin queries against spans. This means that all tags in a query must lie
query to successfully return a trace.

### Case-sensitivity
Queries are case-sensitive. For example, if a document with service name `ABC` is searched using a query `abc`,
Queries are case-sensitive. For example, if a document with service name `ABC` is searched using a query `abc`,
the document will not be retrieved.

## Testing
Expand All @@ -57,6 +48,6 @@ and that script be run from the top folder to integration test ElasticSearch as
This script requires Docker to be running.

### Adding tests
Integration test framework for storage lie under `../integration`.
Integration test framework for storage lie under `../integration`.
Add to `../integration/fixtures/traces/*.json` and `../integration/fixtures/queries.json` to add more
trace cases.

0 comments on commit 0339b4b

Please sign in to comment.