Shrink an index from a snapshot #73500

dakrone · 2021-05-27T19:47:31Z

In order to help address DTS costs, and also reduce the complexity of ILM's shrink operation, we should investigate adding the ability to shrink an index from a snapshot rather than from local disk.

It would be nice if instead of shards having to be colocated, we could take a snapshot of the index, and then perform a restore and shrink at the same time. This would not only alleviate #63519 but it would also be much less error prone to issues like a situation where not all the shards can fit on a node.

In order to accomplish this, we would need to introduce the Default Repository (#66040), so that a repository does not need to be specified for the shrink request.

Note that this diagram shows the combination of shrink and forcemerge as described in #73499, but this is not a prerequisite, as the two are independent.

One extremely beneficial side effect of implementing shrink like this is that shrink would no longer need to perform the slightly error-prone behavior of identifying a single node to move shards prior to the shrink (which itself can cost users money depending on where their data resides). It also would remove some code complexity on the ES side for handling shrink behavior during cluster/node shutdowns.

elasticmachine · 2021-05-27T19:47:33Z

Pinging @elastic/es-distributed (Team:Distributed)

tlrx · 2022-08-23T14:12:21Z

We discussed this in team today and we think that snapshot based recoveries reduced most of the DTS costs while the increasing usage of datastreams reduced the need to shrink indices. We also haven't seen much shrinking issues that would justify the significant effort required to build this, so I'm going to close this.

dakrone added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels May 27, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label May 27, 2021

dakrone mentioned this issue May 27, 2021

Reduce DTS costs for cross zone data transfer within Elasticsearch #73501

Open

DaveCTurner added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Jul 28, 2022

tlrx added team-discuss and removed team-discuss labels Aug 22, 2022

tlrx closed this as completed Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink an index from a snapshot #73500

Shrink an index from a snapshot #73500

dakrone commented May 27, 2021

elasticmachine commented May 27, 2021

tlrx commented Aug 23, 2022

Shrink an index from a snapshot #73500

Shrink an index from a snapshot #73500

Comments

dakrone commented May 27, 2021

elasticmachine commented May 27, 2021

tlrx commented Aug 23, 2022