Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add text boxes and descriptions to reads and writes dashboards #324

Merged
merged 38 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4642b5c
feature: add some text boxes and descriptions
darrenjaneczek Jun 8, 2021
77f8609
fix: text replacements, repair addRows
darrenjaneczek Jun 9, 2021
c4db3e1
fix: changelog
darrenjaneczek Jun 9, 2021
9e6c2f4
Changing copy to add 'latency' as well.
Jun 13, 2021
7a7b13c
Cut down on text from initial PR. Tucked existing text from the compa…
Jun 13, 2021
acc320a
Getting rid of a few space/comma errors.
Jun 13, 2021
8368248
Update CHANGELOG.md
darrenjaneczek Jun 15, 2021
6ad57cd
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
c33303a
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
cb7054c
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
357db43
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
19cb601
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
4735870
Update cortex-mixin/dashboards/compactor.libsonnet
darrenjaneczek Jun 15, 2021
fa48a91
fix: formatting - limit to 4 panels per row
darrenjaneczek Jun 15, 2021
dafb212
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
c7b7871
fmt
darrenjaneczek Jun 15, 2021
6c0066c
fix: remove accidental line
darrenjaneczek Jun 15, 2021
773926a
Update cortex-mixin/dashboards/dashboard-utils.libsonnet
darrenjaneczek Jun 15, 2021
a12d815
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
b335df9
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
73e65cf
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
13f0fa3
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
6c0ebb8
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
ea7d87d
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
4aed696
Update cortex-mixin/dashboards/writes.libsonnet
darrenjaneczek Jun 15, 2021
c411115
Update cortex-mixin/dashboards/reads.libsonnet
darrenjaneczek Jun 15, 2021
0c17f02
fix: Requests per second
darrenjaneczek Jun 15, 2021
b8ccacc
fix: text
darrenjaneczek Jun 15, 2021
d5b14c1
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
b22d22e
fix: clarity
darrenjaneczek Jun 15, 2021
dffe62a
Apply suggestions from code review as per @osg-grafana
darrenjaneczek Jun 15, 2021
2aae011
Merge branch 'darrenjaneczek/dashboard-descriptions-reads-writes' of …
darrenjaneczek Jun 15, 2021
eafdbfc
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
dddd6e7
fix: query formatting to aid in merge
darrenjaneczek Jun 17, 2021
fcc4896
fix: consistent labelling
darrenjaneczek Jun 17, 2021
513b096
fix: ensure panel titles are consistent
darrenjaneczek Jun 17, 2021
5794607
fix: resolve review feedback
darrenjaneczek Jun 21, 2021
4fb7275
Merge branch 'main' into darrenjaneczek/dashboard-descriptions-reads-…
pracucci Jun 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
* [CHANGE] Dashboards: added overridable `job_labels` and `cluster_labels` to the configuration object as label lists to uniquely identify jobs and clusters in the metric names and group-by lists in dashboards. #319
* [CHANGE] Dashboards: `alert_aggregation_labels` has been removed from the configuration and overriding this value has been deprecated. Instead the labels are now defined by the `cluster_labels` list, and should be overridden accordingly through that list. #319
* [ENHANCEMENT] Added documentation text panels and descriptions to Reads and Writes dashboards.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ENHANCEMENT] Added documentation text panels and descriptions to Reads and Writes dashboards.
* [ENHANCEMENT] Added documentation text panels and descriptions to Reads and Writes dashboards. #324

darrenjaneczek marked this conversation as resolved.
Show resolved Hide resolved

## 1.9.0 / 2021-05-18

Expand Down
3 changes: 1 addition & 2 deletions cortex-mixin/dashboards/compactor.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,5 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.latencyPanel('cortex_compactor_meta_sync_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)),
)
)
.addRow($.objectStorePanels1('Object Store', 'compactor'))
.addRow($.objectStorePanels2('', 'compactor')),
.addRows($.getObjectStoreRows('Object Store', 'compactor')),
}
186 changes: 159 additions & 27 deletions cortex-mixin/dashboards/dashboard-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,24 @@ local utils = import 'mixin-utils/utils.libsonnet';
then self.addRow(row)
else self,

addRowsIf(condition, rows)::
if condition
then
local reduceRows(dashboard, remainingRows) =
if (std.length(remainingRows) == 0)
then dashboard
else
reduceRows(
dashboard.addRow(remainingRows[0]),
std.slice(remainingRows, 1, std.length(remainingRows), 1)
)
;
reduceRows(self, rows)
else self,

addRows(rows)::
self.addRowsIf(true, rows),

addClusterSelectorTemplates(multi=true)::
local d = self {
tags: $._config.tags,
Expand Down Expand Up @@ -43,7 +61,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
else d
.addTemplate('cluster', 'cortex_build_info', 'cluster')
.addTemplate('namespace', 'cortex_build_info{cluster=~"$cluster"}', 'namespace'),

editable: true,
darrenjaneczek marked this conversation as resolved.
Show resolved Hide resolved
},

// The mixin allow specialism of the job selector depending on if its a single binary
Expand Down Expand Up @@ -274,8 +292,21 @@ local utils = import 'mixin-utils/utils.libsonnet';
type: 'text',
} + options,

objectStorePanels1(title, component)::
super.row(title)

getObjectStoreRows(title, component):: [
($.row(title) { height: '25px' })
.addPanel(
$.textPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Looks like what it describes is pretty trivial. You can easily notice it's the latency and that it displays avg, median and 99th percentile.

'',
|||
- The panels below summarize the rate of requests issued by %s
to object storage, separated by operation type.
- It also includes the average, median, and 99th percentile latency
of each operation and the error rate of each operation.
||| % component
)
),
$.row('')
.addPanel(
$.panel('Operations / sec') +
$.queryPanel('sum by(operation) (rate(thanos_objstore_bucket_operations_total{%s,component="%s"}[$__rate_interval]))' % [$.namespaceMatcher(), component], '{{operation}}') +
Expand All @@ -288,62 +319,163 @@ local utils = import 'mixin-utils/utils.libsonnet';
{ yaxes: $.yaxes('percentunit') },
)
.addPanel(
$.panel('Op: Attributes') +
$.panel('Latency of Op: Attributes') +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the user see this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached screenshot, @osg-grafana
image

$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="attributes"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Exists') +
$.panel('Latency of Op: Exists') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="exists"}' % [$.namespaceMatcher(), component]),
),

// Second row of Object Store stats
objectStorePanels2(title, component)::
super.row(title)
$.row('')
.addPanel(
$.panel('Op: Get') +
$.panel('Latency of Op: Get') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: GetRange') +
$.panel('Latency of Op: GetRange') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get_range"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Upload') +
$.panel('Latency of Op: Upload') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="upload"}' % [$.namespaceMatcher(), component]),
)
.addPanel(
$.panel('Op: Delete') +
$.panel('Latency of Op: Delete') +
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="delete"}' % [$.namespaceMatcher(), component]),
),
],

thanosMemcachedCache(title, jobName, component, cacheName)::
local config = {
jobMatcher: $.jobMatcher(jobName),
component: component,
cacheName: cacheName,
cacheNameReadable: std.strReplace(cacheName, '-', ' '),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused. I think can be removed.

};
local panelText = {
'metadata-cache':
|||
The metadata cache
is an optional component that the
store-gateway and querier
will check before going to object storage.
This set of panels focuses on the
%s’s use of the metadata cache.
||| % component,
'chunks-cache':
|||
The chunks cache
is an optional component that the
store-gateway
will check before going to object storage.
This helps reduce calls to the object store.
|||,
}[cacheName];

super.row(title)
.addPanel(
$.panel('QPS') +
$.queryPanel('sum by(operation) (rate(thanos_memcached_operations_total{%s,component="%s",name="%s"}[$__rate_interval]))' % [$.jobMatcher(jobName), component, cacheName], '{{operation}}') +
$.textPanel(
'', panelText
)
)
.addPanel(
$.panel('Requests Per Second') +
darrenjaneczek marked this conversation as resolved.
Show resolved Hide resolved
$.queryPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

|||
sum by(operation) (
rate(
thanos_memcached_operations_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
||| % config,
'{{operation}}'
) +
$.stack +
{ yaxes: $.yaxes('ops') },
{ yaxes: $.yaxes('ops') } +
$.panelDescription(
'Requests Per Second',
|||
Requests per second made to
the %(cacheNameReadable)s
from the %(component)s,
separated into request type.
||| % config
),
)
.addPanel(
$.panel('Latency (getmulti)') +
$.latencyPanel('thanos_memcached_operation_duration_seconds', '{%s,operation="getmulti",component="%s",name="%s"}' % [$.jobMatcher(jobName), component, cacheName])
$.latencyPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

'thanos_memcached_operation_duration_seconds',
|||
{
%(jobMatcher)s,
operation="getmulti",
component="%(component)s",
name="%(cacheName)s"
darrenjaneczek marked this conversation as resolved.
Show resolved Hide resolved
}
||| % config
) +
$.panelDescription(
'Latency (getmulti)',
|||
The average, median (50th percentile) and 99th percentile
time to satisfy a “getmulti” request
made by the %(component)s,
which retrieves multiple items from the cache.
||| % config
)
)
.addPanel(
$.panel('Hit ratio') +
$.queryPanel('sum(rate(thanos_cache_memcached_hits_total{%s,component="%s",name="%s"}[$__rate_interval])) / sum(rate(thanos_cache_memcached_requests_total{%s,component="%s",name="%s"}[$__rate_interval]))' %
[
$.jobMatcher(jobName),
component,
cacheName,
$.jobMatcher(jobName),
component,
cacheName,
], 'items') +
{ yaxes: $.yaxes('percentunit') },
$.queryPanel(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: just reformatted.

|||
sum(
rate(
thanos_cache_memcached_hits_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
/
sum(
rate(
thanos_cache_memcached_requests_total{
%(jobMatcher)s,
component="%(component)s",
name="%(cacheName)s"
}[$__rate_interval]
)
)
||| % config,
'items'
) +
{ yaxes: $.yaxes('percentunit') } +
$.panelDescription(
'Hit Ratio',
|||
The fraction of %(component)s requests to the
%(cacheNameReadable)s that successfully return data.
Requests that miss the cache must go to
object storage for the underlying data.
||| % config
),
),

filterNodeDiskContainer(containerName)::
|||
ignoring(%s) group_right() (label_replace(count by(%s, %s, device) (container_fs_writes_bytes_total{%s,container="%s",device!~".*sda.*"}), "device", "$1", "device", "/dev/(.*)") * 0)
||| % [$._config.per_instance_label, $._config.per_node_label, $._config.per_instance_label, $.namespaceMatcher(), containerName],

panelDescription(title, description):: {
description: |||
### %s
%s
||| % [title, description],
},
}
Loading