Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink an index from a snapshot #73500

Closed
dakrone opened this issue May 27, 2021 · 2 comments
Closed

Shrink an index from a snapshot #73500

dakrone opened this issue May 27, 2021 · 2 comments
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team

Comments

@dakrone
Copy link
Member

dakrone commented May 27, 2021

In order to help address DTS costs, and also reduce the complexity of ILM's shrink operation, we should investigate adding the ability to shrink an index from a snapshot rather than from local disk.

It would be nice if instead of shards having to be colocated, we could take a snapshot of the index, and then perform a restore and shrink at the same time. This would not only alleviate #63519 but it would also be much less error prone to issues like a situation where not all the shards can fit on a node.

In order to accomplish this, we would need to introduce the Default Repository (#66040), so that a repository does not need to be specified for the shrink request.

Note that this diagram shows the combination of shrink and forcemerge as described in #73499, but this is not a prerequisite, as the two are independent.

3B7F1C0B-B8E8-4E7D-A871-76D117580208

One extremely beneficial side effect of implementing shrink like this is that shrink would no longer need to perform the slightly error-prone behavior of identifying a single node to move shards prior to the shrink (which itself can cost users money depending on where their data resides). It also would remove some code complexity on the ES side for handling shrink behavior during cluster/node shutdowns.

@dakrone dakrone added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels May 27, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label May 27, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Jul 28, 2022
@tlrx
Copy link
Member

tlrx commented Aug 23, 2022

We discussed this in team today and we think that snapshot based recoveries reduced most of the DTS costs while the increasing usage of datastreams reduced the need to shrink indices. We also haven't seen much shrinking issues that would justify the significant effort required to build this, so I'm going to close this.

@tlrx tlrx closed this as completed Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

4 participants