Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cluster recovery near instantaneous if all shards are present and accounted for #6069

Closed
geekpete opened this issue May 6, 2014 · 16 comments
Assignees

Comments

@geekpete
Copy link
Member

geekpete commented May 6, 2014

When restarting a cluster from green state, each shard appears to undergo some form of checksum to verify it before bringing it online.

Is there a way to journal writes so that recovery is much much faster, in the way that the xfs filesystem does it.

Only review the data that was being written to at the time of the outage or shutdown so that only the in-progress write data needs to be checked.

For a clean shutdown, maybe a complete cluster restart command could tell all nodes to shutdown in a clean state then turn off, allowing a near instantaneous recovery on startup. Like stop allocation then flush all translogs, then shut down,etc.

Would just take lots of the pain out of cluster restarts.

Just an idea

@nik9000
Copy link
Member

nik9000 commented Jul 17, 2014

@javanna, when I met you in Germany we talked about doing something about the slow recovery times. I'm wondering if there is anything I can do to help with that.

@s1monw s1monw self-assigned this Jul 17, 2014
@s1monw
Copy link
Contributor

s1monw commented Jul 17, 2014

@nik9000 we have improvements in the pipeline for this. I can't promise when we will start working on them or when they will land but what we essentially plan is to work out algorithmic parts to reduce the risk of full recovery from the primary shard even if they out of sync just a hand full of documents. I will try to update this issue once I have news about it. Thanks for pinging

@nik9000
Copy link
Member

nik9000 commented Jul 17, 2014

Thanks for the reply! I'm happy to help work on it but I imagine it'd be faster to just have someone familiar with your ideas do it.

@nik9000
Copy link
Member

nik9000 commented Jul 29, 2014

I'm feeling this pain today again while I do a rolling restart to pick up a plugin. And in two weeks when I'll be ready to upgrade to 1.3.1. Because the restart process is so slow I try to batch things that get picked up by the restart but that isn't really good from a "change one thing at a time" perspective.....

@nik9000
Copy link
Member

nik9000 commented Aug 23, 2014

I figure I should poke this issue every time I do a full day cluster restart. Poke. I'm happy to work on this if someone who has thought more about this can share. At this point I figure sinking a couple weeks into speeding up cluster restarts will save me time in the long run.

@clintongormley
Copy link

Hi @nik9000. This improvement depends on the addition of "sequence numbers" (a feature that will enable a number of other improvements). We are currently experimenting with various approaches but rest assured, this issue is not being ignored.

@nik9000
Copy link
Member

nik9000 commented Nov 12, 2014

@clintongormley since you poked me last night about outstanding work I planned to do, can I poke this one? I'd live to have this. In the middle of a cluster rolling restart that is taking two days..... Its thankfully quite boring but still requires some degree of babysitting.

@clintongormley
Copy link

@nik9000 we are working on the design for this one. it is in the top 3 on our list, but obviously complicated and not guaranteed. We'll update the issue as soon as we have more news.

@geekpete
Copy link
Member Author

This near instantaneous recovery idea might also be applied to when a node drops out of the cluster but rejoins with all its data on disk still intact.

Instead of throwing all that data away, if it could be salvaged in some efficient manner so that only the outdated differences need to be transmitted for storage, this would save quite a lot of data transfer on large clusters. Would make failures recoverable in much shorter time periods.

@bleskes
Copy link
Contributor

bleskes commented Nov 20, 2014

@geekpete yeah - the plan is to help with that as well - at least when the down time is planned. When it isn't planned things get slightly trickier as ES will start replicating as a soon as a node goes down - this is no way for it to know how long the down time will take.

@bobrik
Copy link
Contributor

bobrik commented Dec 1, 2014

This is probably related to #8725

@connieyang
Copy link

We have an Elasticsearch cluster (as part of our ELK stack) in production and have experienced (6+ hours) for the cluster to turn green during a rolling restart. When will a fix for this be ready?

@bleskes
Copy link
Contributor

bleskes commented Jan 20, 2015

@connieyang sorry to hear the pain. We are actively working on it. Sadly I can't promise any ETA at the moment.

@cfeio
Copy link

cfeio commented Feb 17, 2015

+1, We are also experiencing pains with cluster recover due to a large cluster size (6 TB cluster). It is taking hours to recover even after a planned maintenance restart. This feature would be a huge improvement!

@shyem
Copy link

shyem commented Feb 19, 2015

+1, same here. With daily index, most of them are unchanged, yet it takes for ever to recover.

@clintongormley
Copy link

Closed by #11336

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants