Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A write alias targeting multiple indices prevents node startup #56186

Closed
DaveCTurner opened this issue May 5, 2020 · 4 comments
Closed

A write alias targeting multiple indices prevents node startup #56186

DaveCTurner opened this issue May 5, 2020 · 4 comments
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team

Comments

@DaveCTurner
Copy link
Contributor

In 6.x (and earlier) it is possible for a node to fail to start because its on-disk cluster state marks multiple indices as the target of writes for an alias:

[WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [REDACTED] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: alias [REDACTED] has more than one write index [REDACTED,REDACTED]

This is fundamentally because each node builds its own copy of the cluster state greedily based on all the index metadata that it can find, but there's no guarantee that it finds a consistent set of metadata. For instance, if the node were shut down while persisting a cluster state then it may have only updated some of the index metadata on disk. Perhaps more commonly, when all shards of an index are moved away from a master-ineligible node then that node stops updating the corresponding index metadata, but does not delete the index metadata immediately so it may contain some very stale alias information (with thanks to @henningandersen for noticing that).

7.x (and later) are not directly affected by this problem since #32006 ensures that cluster states are written atomically so we always see a consistent set of index metadata, although a 7.x node can still encounter this broken state during an upgrade from 6.x.

One possible fix is that we could permit a write alias to target multiple indices (but to reject any indexing to that alias until the ambiguity is resolved). I'm open to other ideas.

@DaveCTurner DaveCTurner added >bug :Data Management/Indices APIs APIs to create and manage indices and templates labels May 5, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Indices APIs)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label May 5, 2020
@danhermann
Copy link
Contributor

We discussed this and thought that adding a system property to bypass the cluster state validation logic that prevents a write alias from targeting multiple write indices would probably be the most expedient way of addressing this because:

  • though rarely encountered, it is hard to recover the cluster to a functional state
  • the cluster state validation logic is still desirable especially because this state should not occur in 7.x clusters

Indexing through the write alias should still be prevented, but that is much more easily fixed by updating the alias to have a single write index.

@Samanthapuri
Copy link

We have encountered same issue in our elastic serach 6.x cluster and only one node out of 3 node cluster is not able to come up.

Can you please help me on how to solve this and bring the node up.

@DaveCTurner
Copy link
Contributor Author

All versions affected by this bug are now past EOL so there is nothing to be done here any more. I am therefore closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants