Skip to content

Commit

Permalink
Minor improvements to translog docs (#28237)
Browse files Browse the repository at this point in the history
The use of the phrase "translog" vs "transaction log" was inconsistent, and
it was apparently unclear that the translog was stored on every shard copy.
  • Loading branch information
DaveCTurner committed Jan 19, 2018
1 parent 0dbe2d6 commit e7be982
Showing 1 changed file with 46 additions and 36 deletions.
82 changes: 46 additions & 36 deletions docs/reference/index-modules/translog.asciidoc
Original file line number Diff line number Diff line change
@@ -1,41 +1,44 @@
[[index-modules-translog]]
== Translog

Changes to Lucene are only persisted to disk during a Lucene commit,
which is a relatively heavy operation and so cannot be performed after every
index or delete operation. Changes that happen after one commit and before another
will be lost in the event of process exit or HW failure.

To prevent this data loss, each shard has a _transaction log_ or write ahead
log associated with it. Any index or delete operation is written to the
translog after being processed by the internal Lucene index.

In the event of a crash, recent transactions can be replayed from the
transaction log when the shard recovers.
Changes to Lucene are only persisted to disk during a Lucene commit, which is a
relatively expensive operation and so cannot be performed after every index or
delete operation. Changes that happen after one commit and before another will
be removed from the index by Lucene in the event of process exit or hardware
failure.

Because Lucene commits are too expensive to perform on every individual change,
each shard copy also has a _transaction log_ known as its _translog_ associated
with it. All index and delete operations are written to the translog after
being processed by the internal Lucene index but before they are acknowledged.
In the event of a crash, recent transactions that have been acknowledged but
not yet included in the last Lucene commit can instead be recovered from the
translog when the shard recovers.

An Elasticsearch flush is the process of performing a Lucene commit and
starting a new translog. It is done automatically in the background in order
to make sure the transaction log doesn't grow too large, which would make
starting a new translog. Flushes are performed automatically in the background
in order to make sure the translog doesn't grow too large, which would make
replaying its operations take a considerable amount of time during recovery.
It is also exposed through an API, though its rarely needed to be performed
manually.
The ability to perform a flush manually is also exposed through an API,
although this is rarely needed.

[float]
=== Translog settings

The data in the transaction log is only persisted to disk when the translog is
The data in the translog is only persisted to disk when the translog is
++fsync++ed and committed. In the event of hardware failure, any data written
since the previous translog commit will be lost.

By default, Elasticsearch ++fsync++s and commits the translog every 5 seconds if `index.translog.durability` is set
to `async` or if set to `request` (default) at the end of every <<docs-index_,index>>, <<docs-delete,delete>>,
<<docs-update,update>>, or <<docs-bulk,bulk>> request. In fact, Elasticsearch
will only report success of an index, delete, update, or bulk request to the
client after the transaction log has been successfully ++fsync++ed and committed
on the primary and on every allocated replica.
By default, Elasticsearch ++fsync++s and commits the translog every 5 seconds
if `index.translog.durability` is set to `async` or if set to `request`
(default) at the end of every <<docs-index_,index>>, <<docs-delete,delete>>,
<<docs-update,update>>, or <<docs-bulk,bulk>> request. More precisely, if set
to `request`, Elasticsearch will only report success of an index, delete,
update, or bulk request to the client after the translog has been successfully
++fsync++ed and committed on the primary and on every allocated replica.

The following <<indices-update-settings,dynamically updatable>> per-index settings
control the behaviour of the transaction log:
The following <<indices-update-settings,dynamically updatable>> per-index
settings control the behaviour of the translog:

`index.translog.sync_interval`::

Expand Down Expand Up @@ -64,17 +67,20 @@ update, or bulk request. This setting accepts the following parameters:

`index.translog.flush_threshold_size`::

The translog stores all operations that are not yet safely persisted in Lucene (i.e., are
not part of a lucene commit point). Although these operations are available for reads, they will
need to be reindexed if the shard was to shutdown and has to be recovered. This settings controls
the maximum total size of these operations, to prevent recoveries from taking too long. Once the
maximum size has been reached a flush will happen, generating a new Lucene commit. Defaults to `512mb`.
The translog stores all operations that are not yet safely persisted in Lucene
(i.e., are not part of a Lucene commit point). Although these operations are
available for reads, they will need to be reindexed if the shard was to
shutdown and has to be recovered. This settings controls the maximum total size
of these operations, to prevent recoveries from taking too long. Once the
maximum size has been reached a flush will happen, generating a new Lucene
commit point. Defaults to `512mb`.

`index.translog.retention.size`::

The total size of translog files to keep. Keeping more translog files increases the chance of performing
an operation based sync when recovering replicas. If the translog files are not sufficient, replica recovery
will fall back to a file based sync. Defaults to `512mb`
The total size of translog files to keep. Keeping more translog files increases
the chance of performing an operation based sync when recovering replicas. If
the translog files are not sufficient, replica recovery will fall back to a
file based sync. Defaults to `512mb`


`index.translog.retention.age`::
Expand All @@ -86,10 +92,14 @@ The maximum duration for which translog files will be kept. Defaults to `12h`.
[[corrupt-translog-truncation]]
=== What to do if the translog becomes corrupted?

In some cases (a bad drive, user error) the translog can become corrupted. When
this corruption is detected by Elasticsearch due to mismatching checksums,
Elasticsearch will fail the shard and refuse to allocate that copy of the data
to the node, recovering from a replica if available.
In some cases (a bad drive, user error) the translog on a shard copy can become
corrupted. When this corruption is detected by Elasticsearch due to mismatching
checksums, Elasticsearch will fail that shard copy and refuse to use that copy
of the data. If there are other copies of the shard available then
Elasticsearch will automatically recover from one of them using the normal
shard allocation and recovery mechanism. In particular, if the corrupt shard
copy was the primary when the corruption was detected then one of its replicas
will be promoted in its place.

If there is no copy of the data from which Elasticsearch can recover
successfully, a user may want to recover the data that is part of the shard at
Expand Down

0 comments on commit e7be982

Please sign in to comment.