Skip to content

Commit

Permalink
Rebuild the pipeline of metrics, logs, and traces that enter via the …
Browse files Browse the repository at this point in the history
…OTLP and OTLP HTTP receivers.

This should match what is required by Application Observability.

Also add the ability to define otel filters for data from the receivers

Signed-off-by: Pete Wall <pete.wall@grafana.com>
  • Loading branch information
petewall committed Feb 5, 2024
1 parent 3e69b43 commit b1a71c2
Show file tree
Hide file tree
Showing 100 changed files with 5,251 additions and 2,727 deletions.
11 changes: 8 additions & 3 deletions charts/k8s-monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,7 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| logs.pod_logs.extraStageBlocks | string | `""` | Stage blocks to be added to the loki.process component for pod logs. See https://grafana.com/docs/agent/latest/flow/reference/components/loki.process/#blocks |
| logs.pod_logs.gatherMethod | string | `"volumes"` | Controls the behavior of gathering pod logs. When set to "volumes", the Grafana Agent will use HostPath volume mounts on the cluster nodes to access the pod log files directly. When set to "api", the Grafana Agent will access pod logs via the API server. This method may be preferable if your cluster prevents DaemonSets, HostPath volume mounts, or for other reasons. |
| logs.pod_logs.namespaces | list | `[]` | Only capture logs from pods in these namespaces (`[]` means all namespaces) |
| logs.receiver.filters.log_record | list | `[]` | |
| metrics.agent.enabled | bool | `true` | Scrape metrics from Grafana Agent |
| metrics.agent.extraMetricRelabelingRules | string | `""` | Rule blocks to be added to the prometheus.relabel component for Grafana Agent. See https://grafana.com/docs/agent/latest/flow/reference/components/prometheus.relabel/#rule-block |
| metrics.agent.extraRelabelingRules | string | `""` | Rule blocks to be added to the discovery.relabel component for Grafana Agent. See https://grafana.com/docs/agent/latest/flow/reference/components/discovery.relabel/#rule-block |
Expand Down Expand Up @@ -302,6 +303,8 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| metrics.probes.extraMetricRelabelingRules | string | `""` | Rule blocks to be added to the prometheus.relabel component for Probe objects. See https://grafana.com/docs/agent/latest/flow/reference/components/prometheus.relabel/#rule-block |
| metrics.probes.namespaces | list | `[]` | Which namespaces to look for Probe objects. |
| metrics.probes.scrapeInterval | string | 60s | How frequently to scrape metrics from Probe objects. Only used if the Probe does not specify the scrape interval. Overrides metrics.scrapeInterval |
| metrics.receiver.filters.datapoint | list | `[]` | |
| metrics.receiver.filters.metric | list | `[]` | |
| metrics.scrapeInterval | string | `"60s"` | How frequently to scrape metrics |
| metrics.serviceMonitors.enabled | bool | `true` | Include service discovery for ServiceMonitor objects |
| metrics.serviceMonitors.extraMetricRelabelingRules | string | `""` | Rule blocks to be added to the prometheus.relabel component for ServiceMonitor objects. See https://grafana.com/docs/agent/latest/flow/reference/components/prometheus.relabel/#rule-block |
Expand Down Expand Up @@ -330,6 +333,9 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| receivers.http.disable_debug_metrics | bool | `true` | It removes attributes which could cause high cardinality metrics. For example, attributes with IP addresses and port numbers in metrics about HTTP and gRPC connections will be removed. |
| receivers.http.enabled | bool | `true` | Receive telemetry data over HTTP? |
| receivers.http.port | int | `4318` | Which port to use for the HTTP receiver. This port needs to be opened in the grafana-agent section below. |
| receivers.processors.batch.maxSize | int | `0` | The upper limit of the amount of data contained in a single batch, in bytes. When set to 0, batches can be any size. |
| receivers.processors.batch.size | int | `16384` | What batch size to use, in bytes |
| receivers.processors.batch.timeout | string | `"2s"` | How long before sending |
| receivers.prometheus.enabled | bool | `false` | Receive Prometheus metrics |
| receivers.prometheus.port | int | `9999` | Which port to use for the Prometheus receiver. This port needs to be opened in the grafana-agent section below. |
| receivers.zipkin.disable_debug_metrics | bool | `true` | It removes attributes which could cause high cardinality metrics. For example, attributes with IP addresses and port numbers in metrics about HTTP and gRPC connections will be removed. |
Expand All @@ -347,9 +353,8 @@ The Prometheus and Loki services may be hosted on the same cluster, or remotely
| test.nodeSelector | object | `{"kubernetes.io/os":"linux"}` | nodeSelector to apply to the test job. |
| test.tolerations | list | `[]` | Tolerations to apply to the test job. |
| traces.enabled | bool | `false` | Receive and forward traces. |
| traces.processors.batch.maxSize | int | `0` | The upper limit of the amount of data contained in a single batch, in bytes. When set to 0, batches can be any size. |
| traces.processors.batch.size | int | `16384` | What batch size to use, in bytes |
| traces.processors.batch.timeout | string | `"2s"` | How long before sending |
| traces.filters.span | list | `[]` | |
| traces.filters.spanevent | list | `[]` | |

## Customizing the configuration

Expand Down
2 changes: 1 addition & 1 deletion charts/k8s-monitoring/ci/ci-2-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ receivers:
extraConfig: |-
tracing {
sampling_fraction = 0.1
write_to = [otelcol.processor.batch.trace_batch_processor.input]
write_to = [otelcol.processor.k8sattributes.default.input]
}
test:
Expand Down
2 changes: 1 addition & 1 deletion charts/k8s-monitoring/ci/ci-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ test:
extraConfig: |-
tracing {
sampling_fraction = 0.1
write_to = [otelcol.processor.batch.trace_batch_processor.input]
write_to = [otelcol.processor.k8sattributes.default.input]
}
opencost:
Expand Down
4 changes: 2 additions & 2 deletions charts/k8s-monitoring/docs/Customizations.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ you want to change something about how often to scrape or which target to scrape

By default, we already use an "allow list" to filter to a specific set of metrics, meaning the list drops metrics that are not useful for monitoring Kubernetes Clusters. The following fields allow for more customizations of the metrics after they have been scraped, but before being sent to the external metric service for storage. Typical changes that you might want to do here include setting, modifying, or dropping metric labels.

* `metrics.<source>.extraMetricsRelabelingRules` - Rules that modify metrics and will populate the rules
* `metrics.<source>.extraMetricRelabelingRules` - Rules that modify metrics and will populate the rules
section of a [prometheus.relabel](https://grafana.com/docs/agent/latest/flow/reference/components/prometheus.relabel/)
component. Use these rules to perform arbitrary modifications to metrics or metric labels.
* `metrics.extraMetricsRelabelingRules` - Same as above, but the rules are applied to metrics from all metric sources.
* `metrics.extraMetricRelabelingRules` - Same as above, but the rules are applied to metrics from all metric sources.
* `metrics.<source>.allowList` - Sets a list of metrics that will be kept, dropping any metrics that don't match.
* `extraServices.prometheus.externalLabels` - A key-value set that defines labels and values to be set for all metrics
being sent. It sets the `external_labels` section of
Expand Down
9 changes: 6 additions & 3 deletions charts/k8s-monitoring/templates/_configs.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
{{- include "agent.config.services" . }}
{{- include "agent.config.endpoints" . }}
{{- include "agent.config.pods" . }}

{{- include "agent.config.receivers" . }}
{{- include "agent.config.processors" . }}

{{- if .Values.metrics.enabled }}
{{- if .Values.metrics.autoDiscover.enabled }}
Expand Down Expand Up @@ -74,12 +76,12 @@
{{- include "agent.config.metricsService" . }}
{{- end }}

{{- if .Values.logs.enabled }}
{{- if and .Values.logs.enabled .Values.logs.pod_logs.enabled }}
{{- include "agent.config.logs.pod_logs_processor" . }}
{{- include "agent.config.loki" . }}
{{- end }}

{{- if and .Values.traces.enabled }}
{{- include "agent.config.traces" . }}
{{- include "agent.config.tracesService" . }}
{{- end }}

Expand All @@ -96,7 +98,8 @@

{{/* Grafana Agent Logs config */}}
{{- define "agentLogsConfig" -}}
{{- include "agent.config.logs.pod_logs" . }}
{{- include "agent.config.logs.pod_logs_discovery" . }}
{{- include "agent.config.logs.pod_logs_processor" . }}
{{- include "agent.config.loki" . }}

{{- if .Values.logs.extraConfig }}
Expand Down
3 changes: 0 additions & 3 deletions charts/k8s-monitoring/templates/agent_config/_agent.river.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,6 @@ prometheus.relabel "agent" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.agent.extraMetricRelabelingRules }}
{{ .Values.metrics.agent.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,6 @@ prometheus.relabel "annotation_autodiscovery" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.autoDiscover.extraMetricRelabelingRules }}
{{ .Values.metrics.autoDiscover.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,6 @@ prometheus.relabel "apiserver" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.apiserver.extraMetricRelabelingRules }}
{{ .Values.metrics.apiserver.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,9 +136,6 @@ prometheus.relabel "cadvisor" {
replacement = ""
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.cadvisor.extraMetricRelabelingRules }}
{{ .Values.metrics.cadvisor.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,6 @@ prometheus.relabel "kube_controller_manager" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.kubeControllerManager.extraMetricRelabelingRules }}
{{ .Values.metrics.kubeControllerManager.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,6 @@ prometheus.relabel "kube_proxy" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.kubeProxy.extraMetricRelabelingRules }}
{{ .Values.metrics.kubeProxy.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,6 @@ prometheus.relabel "kube_scheduler" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.kubeScheduler.extraMetricRelabelingRules }}
{{ .Values.metrics.kubeScheduler.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,6 @@ prometheus.relabel "kube_state_metrics" {
}
{{- end }}

{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if (index .Values.metrics "kube-state-metrics").extraMetricRelabelingRules }}
{{ (index .Values.metrics "kube-state-metrics").extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@ prometheus.relabel "kubernetes_monitoring_telemetry" {
regex = "up|grafana_kubernetes_monitoring_.*"
action = "keep"
}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
forward_to = [prometheus.relabel.metrics_service.receiver]
}
{{ end }}
4 changes: 0 additions & 4 deletions charts/k8s-monitoring/templates/agent_config/_loki.river.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ remote.kubernetes.secret "logs_service" {
namespace = {{ .Values.externalServices.loki.secret.namespace | default .Release.Namespace | quote }}
}

otelcol.exporter.loki "otel_to_loki_converter" {
forward_to = [loki.write.grafana_cloud_loki.receiver]
}

{{- with .Values.externalServices.loki }}
loki.write "grafana_cloud_loki" {
endpoint {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ remote.kubernetes.secret "metrics_service" {
namespace = {{ .Values.externalServices.prometheus.secret.namespace | default .Release.Namespace | quote }}
}

otelcol.exporter.prometheus "otel_to_prom_converter" {
forward_to = [prometheus.relabel.metrics_service.receiver]
}

prometheus.relabel "metrics_service" {
rule {
source_labels = ["cluster"]
Expand All @@ -17,6 +13,10 @@ prometheus.relabel "metrics_service" {
target_label = "cluster"
}

{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}

{{- if eq .Values.externalServices.prometheus.protocol "remote_write" }}
forward_to = [prometheus.remote_write.metrics_service.receiver]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,6 @@ prometheus.relabel "node_exporter" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if (index .Values.metrics "node-exporter").extraMetricRelabelingRules }}
{{ (index .Values.metrics "node-exporter").extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
3 changes: 0 additions & 3 deletions charts/k8s-monitoring/templates/agent_config/_opencost.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,6 @@ prometheus.relabel "opencost" {
action = "drop"
}
{{- end }}
{{- if .Values.metrics.extraMetricRelabelingRules }}
{{ .Values.metrics.extraMetricRelabelingRules | indent 2 }}
{{- end }}
{{- if .Values.metrics.cost.extraMetricRelabelingRules }}
{{ .Values.metrics.cost.extraMetricRelabelingRules | indent 2 }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{ define "agent.config.logs.pod_logs" }}
{{ define "agent.config.logs.pod_logs_discovery" }}
// Pod Logs
discovery.kubernetes "pods" {
role = "pod"
Expand Down Expand Up @@ -100,45 +100,4 @@ loki.source.kubernetes "pod_logs" {
{{- end }}
}
{{- end }}
loki.process "pod_logs" {
stage.match {
selector = "{tmp_container_runtime=\"containerd\"}"
// the cri processing stage extracts the following k/v pairs: log, stream, time, flags
stage.cri {}

// Set the extract flags and stream values as labels
stage.labels {
values = {
flags = "",
stream = "",
}
}
}

// if the label tmp_container_runtime from above is docker parse using docker
stage.match {
selector = "{tmp_container_runtime=\"docker\"}"
// the docker processing stage extracts the following k/v pairs: log, stream, time
stage.docker {}

// Set the extract stream value as a label
stage.labels {
values = {
stream = "",
}
}
}

// Drop the filename label, since it's not really useful in the context of Kubernetes, where we already have
// cluster, namespace, pod, and container labels.
// Also drop the temporary container runtime label as it is no longer needed.
stage.label_drop {
values = ["filename", "tmp_container_runtime"]
}

{{- if .Values.logs.pod_logs.extraStageBlocks }}
{{ .Values.logs.pod_logs.extraStageBlocks | indent 2 }}
{{ end }}
forward_to = [loki.write.grafana_cloud_loki.receiver]
}
{{ end }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{{ define "agent.config.logs.pod_logs_processor" }}
loki.process "pod_logs" {
stage.match {
selector = "{tmp_container_runtime=\"containerd\"}"
// the cri processing stage extracts the following k/v pairs: log, stream, time, flags
stage.cri {}

// Set the extract flags and stream values as labels
stage.labels {
values = {
flags = "",
stream = "",
}
}
}

// if the label tmp_container_runtime from above is docker parse using docker
stage.match {
selector = "{tmp_container_runtime=\"docker\"}"
// the docker processing stage extracts the following k/v pairs: log, stream, time
stage.docker {}

// Set the extract stream value as a label
stage.labels {
values = {
stream = "",
}
}
}

// Drop the filename label, since it's not really useful in the context of Kubernetes, where we already have
// cluster, namespace, pod, and container labels.
// Also drop the temporary container runtime label as it is no longer needed.
stage.label_drop {
values = ["filename", "tmp_container_runtime"]
}

{{- if .Values.logs.pod_logs.extraStageBlocks }}
{{ .Values.logs.pod_logs.extraStageBlocks | indent 2 }}
{{ end }}
forward_to = [loki.write.grafana_cloud_loki.receiver]
}
{{ end }}
Loading

0 comments on commit b1a71c2

Please sign in to comment.