Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an option to only assign shards to nodes that already have them as part of allocation #9425

Closed
ppf2 opened this issue Jan 26, 2015 · 2 comments
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one.

Comments

@ppf2
Copy link
Member

ppf2 commented Jan 26, 2015

Consider the following repro:

image

image

image

image

image

image

  • node 2 has [0], [1], [4] as its original allocation.
  • cluster.routing.allocation.node_concurrent_recoveries is set to 1 (on purpose for the reproduction).
  • allocation is disabled, and then node 2 is stopped.
  • node 2 is started back up and then allocation is enabled again.
  • [0] and [1] get allocated back to node 2 successfully when allocation is enabled (after its restart)
  • [4] ends up on node 1 (instead of node 2) because node 2 already has 1 target recovery outstanding (so it decides to find another node to allocate [4] to while node 2 is performing its recovery) while node 1 has 1 source recovery (but capable of being a target for the recovery of [4]).
  • At the end, rebalancing kicks in and moved [2] from node 1 to node 2, so node 2 now has [0], [1], and [2] instead of [0], [1], [4] .

It will be helpful to provide an additional option to cluster.routing.allocation.enable so that it will only assign shards to nodes that already have them to prevent it from performing unnecessary allocation of a shard to a different node as part of rolling restarts. While increasing cluster.routing.allocation.node_concurrent_recoveries (from the default of 2) is a potential workaround for small deployments, it is not a viable solution for deployments with a large # of shards on each node due to its potential network and i/o implications. For example, we can add an existing option to cluster.routing.allocation.enable that also works in conjunction with settings like new_primaries (eg. "existing,new_primaries”).

@martijnvg
Copy link
Member

@ppf2 I think delayed shard allocation already helps here? and #12421 would kind of do what is desired in an automatic manner?

@clintongormley
Copy link

I agree with @martijnvg. Adding hard rules will break allocation in ways that could lose data. Closing in favour of #12421 and #11438

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one.
Projects
None yet
Development

No branches or pull requests

4 participants