Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the spanmetrics connector #4452

Merged
merged 12 commits into from
May 27, 2023
29 changes: 29 additions & 0 deletions docker-compose/monitor/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY: build
build: clean-jaeger
cd ../../ && \
make build-all-in-one && \
make docker-images-jaeger-backend && \
docker tag jaegertracing/all-in-one:latest jaegertracing/all-in-one:dev
albertteoh marked this conversation as resolved.
Show resolved Hide resolved

.PHONY: run
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
run:
docker compose -f docker-compose-connector.yml up

# This make target is setup to allow devs to run the older spanmetrics processor setup, for example,
# to test backwards compatibility of Jaeger with spanmetrics processor.
.PHONY: run-processor
run-processor:
docker compose -f docker-compose-processor.yml up

.PHONY: clean-jaeger
clean-jaeger:
# Also cleans up intermediate cached containers.
docker system prune -f

.PHONY: clean-all
clean: clean-jaeger
docker rmi -f jaegertracing/all-in-one:dev ; \
docker rmi -f jaegertracing/all-in-one:latest ; \
docker rmi -f otel/opentelemetry-collector-contrib:latest ; \
docker rmi -f prom/prometheus:latest ; \
docker rmi -f grafana/grafana:latest
56 changes: 48 additions & 8 deletions docker-compose/monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ This environment consists the following backend components:

- [MicroSim](https://github.com/yurishkuro/microsim): a program to simulate traces.
- [Jaeger All-in-one](https://www.jaegertracing.io/docs/1.24/getting-started/#all-in-one): the full Jaeger stack in a single container image.
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/): vendor agnostic integration layer for traces and metrics. Its main role in this particular development environment is to receive Jaeger spans, forward these spans untouched to Jaeger All-in-one while simultaneously aggregating metrics out of this span data. To learn more about span metrics aggregation, please refer to the [spanmetrics processor documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor).
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/): vendor agnostic integration layer for traces and metrics. Its main role in this particular development environment is to receive Jaeger spans, forward these spans untouched to Jaeger All-in-one while simultaneously aggregating metrics out of this span data. To learn more about span metrics aggregation, please refer to the [spanmetrics processor documentation][spanmetricsprocessor].
- [Prometheus](https://prometheus.io/): a metrics collection and query engine, used to scrape metrics computed by OpenTelemetry Collector, and presents an API for Jaeger All-in-one to query these metrics.
- [Grafana](https://grafana.com/): a metrics visualization, analytics & monitoring solution supporting multiple metrics databases.

Expand All @@ -26,11 +26,16 @@ The following diagram illustrates the relationship between these components:

# Getting Started
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved

## Bring up/down the dev environment
## Build jaeger-all-in-one docker image

```shell
make build
```

## Bring up the dev environment

```bash
docker compose up
docker compose down
make run
```

**Tips:**
Expand All @@ -42,10 +47,7 @@ docker compose down
**Warning:** The included [docker-compose.yml](./docker-compose.yml) file uses the `latest` version of Jaeger and other components. If your local Docker registry already contains older versions, which may still be tagged as `latest`, you may want to delete those images before running the full set, to ensure consistent behavior:

```bash
docker rmi -f jaegertracing/all-in-one:latest
docker rmi -f otel/opentelemetry-collector-contrib:latest
docker rmi -f prom/prometheus:latest
docker rmi -f grafana/grafana:latest
make clean-all
```

## Sending traces
Expand Down Expand Up @@ -83,6 +85,40 @@ Then navigate to the Monitor tab at http://localhost:16686/monitor to view the R

![My Service RED Metrics](images/my_service_metrics.png)

## Migrating to Span Metrics Connector
yurishkuro marked this conversation as resolved.
Show resolved Hide resolved

### Background

A new [Connector](https://pkg.go.dev/go.opentelemetry.io/collector/connector#section-readme) component was introduced
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
to the OpenTelemetry Collector to provide a means of receiving and exporting between any type of telemetry.

The existing [Span Metrics Processor][spanmetricsprocessor] was a good candidate to migrate over to the connector type,
resulting in the new [Span Metrics Connector][spanmetricsconnector] component.

The Span Metrics Connector variant introduces some [breaking changes][processor-to-connector], and the following
section aims to provide the instructions necessary to use the metrics produced by this component.

### Migrating

Assuming the OpenTelemetry Collector is running with the [Span Metrics Connector][spanmetricsconnector] correctly
configured, the following configuration should be applied to jaeger-query or jaeger-all-in-one:

As command line parameters:
```shell
--prometheus.query.namespace=span_metrics
--prometheus.query.duration-metric-name=duration
--prometheus.query.duration-unit=ms
--prometheus.query.span-name-label=span_name
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
```

As environment variables:
```shell
PROMETHEUS_QUERY_NAMESPACE=span_metrics
PROMETHEUS_QUERY_DURATION_METRIC_NAME=duration
PROMETHEUS_QUERY_DURATION_UNIT=ms
PROMETHEUS_QUERY_SPAN_NAME_LABEL=span_name
```

## Querying the HTTP API

### Example 1
Expand Down Expand Up @@ -247,3 +283,7 @@ $ curl http://localhost:16686/api/metrics/minstep | jq .
]
}
```

[spanmetricsprocessor]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/spanmetricsprocessor
[spanmetricsconnector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector
[processor-to-connector]: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector#span-to-metrics-processor-to-span-to-metrics-connector
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,26 @@ services:
jaeger:
networks:
- backend
image: jaegertracing/all-in-one:latest
image: jaegertracing/all-in-one:dev
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
volumes:
- "./jaeger-ui.json:/etc/jaeger/jaeger-ui.json"
command: --query.ui-config /etc/jaeger/jaeger-ui.json
environment:
- METRICS_STORAGE_TYPE=prometheus
- PROMETHEUS_SERVER_URL=http://prometheus:9090
- PROMETHEUS_QUERY_NAMESPACE=span_metrics
- PROMETHEUS_QUERY_DURATION_METRIC_NAME=duration
- PROMETHEUS_QUERY_DURATION_UNIT=ms
- PROMETHEUS_QUERY_SPAN_NAME_LABEL=span_name
- LOG_LEVEL=debug
ports:
- "16686:16686"
otel_collector:
networks:
- backend
image: otel/opentelemetry-collector-contrib:latest
volumes:
- "./otel-collector-config.yml:/etc/otelcol/otel-collector-config.yml"
- "./otel-collector-config-connector.yml:/etc/otelcol/otel-collector-config.yml"
command: --config /etc/otelcol/otel-collector-config.yml
ports:
- "4317:4317"
Expand Down
59 changes: 59 additions & 0 deletions docker-compose/monitor/docker-compose-processor.yml
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
version: "3.5"
services:
jaeger:
networks:
- backend
image: jaegertracing/all-in-one:dev
volumes:
- "./jaeger-ui.json:/etc/jaeger/jaeger-ui.json"
command: --query.ui-config /etc/jaeger/jaeger-ui.json
environment:
- METRICS_STORAGE_TYPE=prometheus
- PROMETHEUS_SERVER_URL=http://prometheus:9090
- LOG_LEVEL=debug
ports:
- "16686:16686"
otel_collector:
networks:
- backend
# Fix to a version before the spanmetrics processor was deprecated and before the prometheusexporter
# changes that append units to latency/histogram metrics.
image: otel/opentelemetry-collector-contrib:0.70.0
volumes:
- "./otel-collector-config-processor.yml:/etc/otelcol/otel-collector-config.yml"
command: --config /etc/otelcol/otel-collector-config.yml
ports:
- "4317:4317"
depends_on:
- jaeger
microsim:
networks:
- backend
image: yurishkuro/microsim:0.2.0
command: "-j http://otel_collector:14278/api/traces -d 24h -s 500ms"
depends_on:
- otel_collector
prometheus:
networks:
- backend
image: prom/prometheus:latest
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
ports:
- "9090:9090"
grafana:
networks:
- backend
image: grafana/grafana:latest
volumes:
- ./grafana.ini:/etc/grafana/grafana.ini
- ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
ports:
- 3000:3000

networks:
backend:
38 changes: 38 additions & 0 deletions docker-compose/monitor/otel-collector-config-connector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14278"

otlp:
protocols:
grpc:
http:

exporters:
prometheus:
endpoint: "0.0.0.0:8889"

jaeger:
endpoint: "jaeger:14250"
tls:
insecure: true

connectors:
spanmetrics:
namespace: span.metrics

processors:
batch:

service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [batch]
exporters: [spanmetrics, jaeger]
# The exporter name in this pipeline must match the spanmetrics.metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
13 changes: 9 additions & 4 deletions pkg/prometheus/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,13 @@ import (

// Configuration describes the options to customize the storage behavior.
type Configuration struct {
ServerURL string
ConnectTimeout time.Duration
TLS tlscfg.Options
TokenFilePath string
ServerURL string
ConnectTimeout time.Duration
TLS tlscfg.Options
TokenFilePath string
MetricNamespace string
CallsMetricName string
LatencyMetricName string
LatencyUnit string
OperationLabel string
}
2 changes: 1 addition & 1 deletion plugin/metrics/prometheus/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func (f *Factory) AddFlags(flagSet *flag.FlagSet) {
// InitFromViper implements plugin.Configurable.
func (f *Factory) InitFromViper(v *viper.Viper, logger *zap.Logger) {
if err := f.options.InitFromViper(v); err != nil {
logger.Fatal("Failed to initialize metrics storage factory", zap.Error(err))
logger.Panic("Failed to initialize metrics storage factory", zap.Error(err))
albertteoh marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down
43 changes: 41 additions & 2 deletions plugin/metrics/prometheus/factory_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,17 @@ func TestWithDefaultConfiguration(t *testing.T) {
f := NewFactory()
assert.Equal(t, "http://localhost:9090", f.options.Primary.ServerURL)
assert.Equal(t, 30*time.Second, f.options.Primary.ConnectTimeout)

// Ensure backwards compatibility with OTEL's spanmetricsprocessor.
assert.Empty(t, f.options.Primary.MetricNamespace)
assert.Empty(t, f.options.Primary.LatencyUnit)
assert.Equal(t, "calls", f.options.Primary.CallsMetricName)
assert.Equal(t, "latency", f.options.Primary.LatencyMetricName)
assert.Equal(t, "operation", f.options.Primary.OperationLabel)
}

func TestWithConfiguration(t *testing.T) {
t.Run("With custom configuration and no space in token file path", func(t *testing.T) {
t.Run("with custom configuration and no space in token file path", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
Expand All @@ -69,7 +76,7 @@ func TestWithConfiguration(t *testing.T) {
assert.Equal(t, 5*time.Second, f.options.Primary.ConnectTimeout)
assert.Equal(t, "test/test_file.txt", f.options.Primary.TokenFilePath)
})
t.Run("With space in token file path", func(t *testing.T) {
t.Run("with space in token file path", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
Expand All @@ -79,6 +86,38 @@ func TestWithConfiguration(t *testing.T) {
f.InitFromViper(v, zap.NewNop())
assert.Equal(t, "test/ test file.txt", f.options.Primary.TokenFilePath)
})
t.Run("with custom configuration of prometheus.query", func(t *testing.T) {
f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
"--prometheus.query.namespace=mynamespace",
"--prometheus.query.calls-metric-name=mycalls",
"--prometheus.query.duration-metric-name=myduration",
"--prometheus.query.duration-unit=ms",
})
require.NoError(t, err)
f.InitFromViper(v, zap.NewNop())
assert.Equal(t, "mynamespace", f.options.Primary.MetricNamespace)
assert.Equal(t, "mycalls", f.options.Primary.CallsMetricName)
assert.Equal(t, "myduration", f.options.Primary.LatencyMetricName)
assert.Equal(t, "ms", f.options.Primary.LatencyUnit)
})
t.Run("with invalid prometheus.query.duration-unit", func(t *testing.T) {
defer func() {
if r := recover(); r == nil {
t.Errorf("Expected a panic due to invalid duration-unit")
}
}()

f := NewFactory()
v, command := config.Viperize(f.AddFlags)
err := command.ParseFlags([]string{
"--prometheus.query.duration-unit=milliseconds",
})
require.NoError(t, err)
f.InitFromViper(v, zap.NewNop())
require.Empty(t, f.options.Primary.LatencyUnit)
})
}

func TestFailedTLSOptions(t *testing.T) {
Expand Down
Loading