-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for 'grace period expiration' before shard reallocation? #3569
Comments
I think you can simply use disable allocation via the cluster settings API |
Not really -- there are cases where we may see a little blip in the network that trigger the shard reallocation. I think its a pretty worth-while feature to say "wait 5m for things to settle before re-allocating shards". This is also useful for cases where you may be bringing up more than 1 additional node in terms of capacity. |
Was curious about this too; I'd like to see ES give me a grace period of something like 5 minutes before deciding to rebalance shards based on a node going offline. Most likely that node is coming back up (eg, due to maintenance reboot of the service or host) and would be preferred the cluster remain in a degraded state to hope that the node comes back online, but then consider it dead after a certain timeout and rebalance as needed. |
If a shard disappears briefly then returns, it needs to recover from the primary shard in case anything has changed. Currently, that is likely to mean copying lots of segments over from the primary (as the primary and replica will probably have diverged). Making this fast will not be possible until #6069 is implemented. We can revisit this issue on #6069 is in. |
Would love to see a configurable timeout for this as well. While we can disable_allocation for planned events, we've had cases where a network blip will cause indices to begin reshuffling |
+1, We would really love to see a configurable timeout also, if not being able to disable it completely. In an unplanned outage we would prefer to be able to configure not have replicas reassigned and shuffled, as this leads to a lot of data moving. As far as I know I am not aware of such a setting. |
+1, this would be really helpful for us as we have a massive amount of data in our ELK cluster and since we already have 1 replica available still there is no point of duplicating that one more time. |
Fixed via #11712 |
It would be really useful to allow for a 'grace period' between when ES notices that a particular node has gone down, and shard-reallocation begins. There are times when we might want to do a quick restart of an ES node ... or take one down for a full reboot ... and we don't want to do a re-allocation of shards because thats a very IO-intensive operation. In our case, we also use the Zookeeper plugin, and a shard-reallocation is triggered by a short communication break between the ES nodes and Zookeeper.
The text was updated successfully, but these errors were encountered: