Skip to content

Commit

Permalink
[8.10] Clarify data stream recommendations and best practices (#107233)…
Browse files Browse the repository at this point in the history
… (#107235)

* Clarify data stream recommendations and best practices (#107233)

* Clarify data stream recommendations and best practices

Our documentation around data streams versus aliases could be interpreted in a way where someone doing *any* updates thinks they need to use an alias with indices instead of a data stream. This commit enhances the documentation around these areas to determine the correct abstraction in a more concrete way. It also tries to clarify that data streams still allow updates to the backing indices, and that a difference is last-write-wins versus first-write-wins.

* Remove dlm link
  • Loading branch information
dakrone authored Apr 8, 2024
1 parent 9988eec commit 1e03bf0
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 19 deletions.
39 changes: 30 additions & 9 deletions docs/reference/data-streams/data-streams.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,27 @@ automate the management of these backing indices. For example, you can use
hardware and delete unneeded indices. {ilm-init} can help you reduce costs and
overhead as your data grows.


[discrete]
[[should-you-use-a-data-stream]]
== Should you use a data stream?

To determine whether you should use a data stream for your data, you should consider the format of
the data, and your expected interaction. A good candidate for using a data stream will match the
following criteria:

* Your data contains a timestamp field, or one could be automatically generated.
* You mostly perform indexing requests, with occasional updates and deletes.
* You index documents without an `_id`, or when indexing documents with an explicit `_id` you expect first-write-wins behavior.

For most time series data use-cases, a data stream will be a good fit. However, if you find that
your data doesn't fit into these categories (for example, if you frequently send multiple documents
using the same `_id` expecting last-write-wins), you may want to use an index alias with a write
index instead. See documentation for <<manage-time-series-data-without-data-streams,managing time
series data without a data stream>> for more information.

Keep in mind that some features such as <<tsds,Time Series Data Streams (TSDS)>> require a data stream.

[discrete]
[[backing-indices]]
== Backing indices
Expand Down Expand Up @@ -116,19 +137,19 @@ You should not derive any intelligence from the backing indices names.

[discrete]
[[data-streams-append-only]]
== Append-only
== Append-only (mostly)

Data streams are designed for use cases where existing data is rarely,
if ever, updated. You cannot send update or deletion requests for existing
documents directly to a data stream. Instead, use the
Data streams are designed for use cases where existing data is rarely updated. You cannot send
update or deletion requests for existing documents directly to a data stream. However, you can still
<<update-delete-docs-in-a-backing-index,update or delete documents>> in a data stream by submitting
requests directly to the document's backing index.

If you need to update a larger number of documents in a data stream, you can use the
<<update-docs-in-a-data-stream-by-query,update by query>> and
<<delete-docs-in-a-data-stream-by-query,delete by query>> APIs.

If needed, you can <<update-delete-docs-in-a-backing-index,update or delete
documents>> by submitting requests directly to the document's backing index.

TIP: If you frequently update or delete existing time series data, use an index
alias with a write index instead of a data stream. See
TIP: If you frequently send multiple documents using the same `_id` expecting last-write-wins, you
may want to use an index alias with a write index instead. See
<<manage-time-series-data-without-data-streams>>.

include::set-up-a-data-stream.asciidoc[]
Expand Down
17 changes: 9 additions & 8 deletions docs/reference/ilm/ilm-tutorial.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -282,14 +282,15 @@ DELETE /_index_template/timeseries_template
[[manage-time-series-data-without-data-streams]]
=== Manage time series data without data streams

Even though <<data-streams, data streams>> are a convenient way to scale
and manage time series data, they are designed to be append-only. We recognise there
might be use-cases where data needs to be updated or deleted in place and the
data streams don't support delete and update requests directly,
so the index APIs would need to be used directly on the data stream's backing indices.

In these cases, you can use an index alias to manage indices containing the time series data
and periodically roll over to a new index.
Even though <<data-streams, data streams>> are a convenient way to scale and manage time series
data, they are designed to be append-only. We recognise there might be use-cases where data needs to
be updated or deleted in place and the data streams don't support delete and update requests
directly, so the index APIs would need to be used directly on the data stream's backing indices. In
these cases we still recommend using a data stream.

If you frequently send multiple documents using the same `_id` expecting last-write-wins, you can
use an index alias instead of a data stream to manage indices containing the time series data and
periodically roll over to a new index.

To automate rollover and management of time series indices with {ilm-init} using an index
alias, you:
Expand Down
7 changes: 5 additions & 2 deletions docs/reference/ilm/set-up-lifecycle-policy.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ To use a policy that triggers the rollover action,
you need to configure the policy in the index template used to create each new index.
You specify the name of the policy and the alias used to reference the rolling indices.

TIP: An `index.lifecycle.rollover_alias` setting is only required if using {ilm} with an alias. It is unnecessary when using <<data-streams,Data Streams>>.

You can use the {kib} Create template wizard to create a template. To access the
wizard, open the menu and go to *Stack Management > Index Management*. In the
*Index Templates* tab, click *Create template*.
Expand Down Expand Up @@ -128,8 +130,9 @@ DELETE _index_template/my_template
[[create-initial-index]]
==== Create an initial managed index

When you set up policies for your own rolling indices, you need to manually create the first index
managed by a policy and designate it as the write index.
When you set up policies for your own rolling indices, if you are not using the recommended
<<data-streams,data streams>>, you need to manually create the first index managed by a policy and
designate it as the write index.

IMPORTANT: When you enable {ilm} for {beats} or the {ls} {es} output plugin,
the necessary policies and configuration changes are applied automatically.
Expand Down

0 comments on commit 1e03bf0

Please sign in to comment.