Add command-line tool support for Lucene-based metadata storage #50179

ywelsch · 2019-12-13T13:51:25Z

Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage.

Relates #48701

elasticmachine · 2019-12-13T13:51:28Z

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

ywelsch · 2019-12-16T16:45:48Z

@elasticmachine run elasticsearch-ci/2

henningandersen

LGTM.

I left a few minor comments to consider. I assume the two important TODOs (setLastAcceptedTerm and dangling indices) will be tackled in follow-ups.

henningandersen · 2019-12-16T21:36:23Z

server/src/main/java/org/elasticsearch/env/NodeRepurposeCommand.java

+        final MetaData metaData = loadClusterState(terminal, env, psf).metaData();
+
+        final Set<Path> indexPaths = uniqueParentPaths(shardDataPaths);
+        final Set<String> indexUUIDs = Sets.union(indexUUIDsFor(indexPaths),


I think this will output all indices rather than the affected indices? Since we only delete the shardDataPaths, I see no reason to list other indices as affected.

fixed in fd3dd91

henningandersen · 2019-12-17T09:31:24Z

server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java


 public abstract class ElasticsearchNodeCommand extends EnvironmentAwareCommand {
-    private static final Logger logger = LogManager.getLogger(ElasticsearchNodeCommand.class);
+    protected static final Logger logger = LogManager.getLogger(ElasticsearchNodeCommand.class);


I think this can still be private?

adapted in 0c8bb67

henningandersen · 2019-12-17T10:11:41Z

server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java

+                    final Tuple<Manifest, MetaData> legacyState = metaStateService.loadFullState();
+                    if (legacyState.v1().isEmpty() == false) {
+                        onDiskState = new LucenePersistedStateFactory.OnDiskState(lucenePersistedStateFactory.getNodeId(),
+                            null, legacyState.v1().getCurrentTerm(),


Perhaps we could remove the dataPath from OnDiskState in a follow-up to avoid the null here?

adapted in 0c8bb67

henningandersen · 2019-12-17T10:24:31Z

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

-        final MetaData metaData;
+    public static class OnDiskState {
+        public final String nodeId;
+        public final Path dataPath;


I think we should leave this one private since it is only used internally here to provide logging. Alternatively remove it from here and control the best path separately in the loop below.

adapted in 0c8bb67

henningandersen · 2019-12-17T10:32:29Z

server/src/main/java/org/elasticsearch/gateway/LucenePersistedStateFactory.java

-    public LucenePersistedStateFactory(NodeEnvironment nodeEnvironment, NamedXContentRegistry namedXContentRegistry, BigArrays bigArrays,
-                                       LegacyLoader legacyLoader) {
-        this.nodeEnvironment = nodeEnvironment;
+    public LucenePersistedStateFactory(Path[] dataPaths, String nodeId, boolean preserveUnknownCustoms,


nit: Might be an old C++ habit, but I prefer to have the preserveUnknownCustoms parameter last, to make it clear that the other constructor is identical to this except using a default value for the preserveUnknownCustoms parameter.

adapted in cffdd8a.
The way I had it, the parameter list had the same suffix. Now it does not have same prefix nor suffix...

DaveCTurner

I do not think we should expose the internals of LucenePersistedStateFactory as much as we are proposing here. The internal details and the state-loading logic are once again spilling into GatewayMetaState and elsewhere. I would prefer to keep things more isolated and add public methods to LucenePersistedStateFactory as needed to support the non-standard ways that the node tool accesses the on-disk state.

server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java

ywelsch

I do not think we should expose the internals of LucenePersistedStateFactory as much as we are proposing here.

I've hidden more things away in 0c8bb67. I don't want to go back to LegacyLoader and introducing more callbacks to LucenePersistedStateFactory (e.g. to adapt cluster state). I think that as it stands this class has a minimal interface, and does not need to concern itself with legacy stuff.

I would prefer to keep things more isolated and add public methods to LucenePersistedStateFactory as needed to support the non-standard ways that the node tool accesses the on-disk state.

That was my starting point, but after adding more bells and whistles to LucenePersistedStateFactory that were essentially irrelevant for what this class was trying to achieve, I didn't like it and got to the current solution.

I find the current minimal Writer/OnDiskState interfaces a good compromise between having a closed interface and unnecessary parameterization of the class.

DaveCTurner

LGTM thanks for the re-organisation.

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (elastic#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701 * Remove per-index metadata without assigned shards (elastic#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates elastic#48701 * Use Lucene exclusively for metadata storage (elastic#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates elastic#48701 * Add command-line tool support for Lucene-based metadata storage (elastic#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates elastic#48701 * Use single directory for metadata (elastic#50639) Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (elastic#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates elastic#48701 * Fold node metadata into new node storage (elastic#50741) Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source. * Write CS asynchronously on data-only nodes (elastic#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates elastic#48701 * Remove persistent cluster settings tool (elastic#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates elastic#48701 * Make cluster state writer resilient to disk issues (elastic#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701 * Omit writing global metadata if no change (elastic#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates elastic#48701 * DanglingIndicesIT should ensure node removed first (elastic#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

* Move metadata storage to Lucene (#50907) Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co> * fix tests Co-authored-by: David Turner <david.turner@elastic.co>

Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (elastic#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates elastic#48701 * Remove per-index metadata without assigned shards (elastic#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates elastic#48701 * Use Lucene exclusively for metadata storage (elastic#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates elastic#48701 * Add command-line tool support for Lucene-based metadata storage (elastic#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates elastic#48701 * Use single directory for metadata (elastic#50639) Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (elastic#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates elastic#48701 * Fold node metadata into new node storage (elastic#50741) Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source. * Write CS asynchronously on data-only nodes (elastic#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates elastic#48701 * Remove persistent cluster settings tool (elastic#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates elastic#48701 * Make cluster state writer resilient to disk issues (elastic#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701 * Omit writing global metadata if no change (elastic#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates elastic#48701 * DanglingIndicesIT should ensure node removed first (elastic#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co>

elastic/elasticsearch#50179

Add command-line tool support for Lucene-based metadata storage

a083305

ywelsch added >enhancement :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Dec 13, 2019

ywelsch mentioned this pull request Dec 13, 2019

Support node tools for Lucene-based cluster storage #50070

Closed

ywelsch requested a review from DaveCTurner December 13, 2019 14:44

ywelsch marked this pull request as ready for review December 13, 2019 14:44

ywelsch requested a review from henningandersen December 13, 2019 14:44

Randomize tests to work without cluster state

d1b5c62

henningandersen approved these changes Dec 17, 2019

View reviewed changes

DaveCTurner requested changes Dec 17, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java Show resolved Hide resolved

ywelsch added 3 commits December 17, 2019 14:51

reorder params

cffdd8a

Hide more stuff

0c8bb67

Not all indices

fd3dd91

ywelsch commented Dec 17, 2019

View reviewed changes

ywelsch added 5 commits December 17, 2019 17:59

not

c400cc4

Merge branch 'reduce-metadata-writes-master' into lucene-meta-tools

a3ee1e1

Rename LucenePersistedStateFactory and adapt tests

698571c

checkstyle

d58866c

remove dead code

45e3229

ywelsch requested a review from DaveCTurner December 18, 2019 16:34

DaveCTurner approved these changes Dec 19, 2019

View reviewed changes

ywelsch merged commit c6e1ea8 into elastic:reduce-metadata-writes-master Dec 19, 2019

DaveCTurner mentioned this pull request Dec 19, 2019

Reduce number of writes needed for metadata updates #48701

Closed

14 tasks

ywelsch mentioned this pull request Jan 13, 2020

Move metadata storage to Lucene #50928

Merged

mkleen mentioned this pull request Apr 16, 2021

Move metadata storage to Lucene crate/crate#11270

Merged

5 tasks

mkleen added a commit to crate/crate that referenced this pull request Apr 19, 2021

WIP: bp: Add command-line tool support for Lucene-based metadata storage

8b7700a

elastic/elasticsearch#50179

mkleen added a commit to crate/crate that referenced this pull request Apr 20, 2021

WIP: bp: Add command-line tool support for Lucene-based metadata storage

283db17

elastic/elasticsearch#50179

mkleen added a commit to crate/crate that referenced this pull request Apr 26, 2021

WIP: bp: Add command-line tool support for Lucene-based metadata storage

38a779d

elastic/elasticsearch#50179

mkleen added a commit to crate/crate that referenced this pull request Apr 26, 2021

bp: Add command-line tool support for Lucene-based metadata storage

26cb89a

elastic/elasticsearch#50179

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add command-line tool support for Lucene-based metadata storage #50179

Add command-line tool support for Lucene-based metadata storage #50179

ywelsch commented Dec 13, 2019

elasticmachine commented Dec 13, 2019

ywelsch commented Dec 16, 2019

henningandersen left a comment

henningandersen Dec 16, 2019

ywelsch Dec 17, 2019

henningandersen Dec 17, 2019

ywelsch Dec 17, 2019

henningandersen Dec 17, 2019

ywelsch Dec 17, 2019

henningandersen Dec 17, 2019

ywelsch Dec 17, 2019

henningandersen Dec 17, 2019

ywelsch Dec 17, 2019

DaveCTurner left a comment

ywelsch left a comment

DaveCTurner left a comment

Add command-line tool support for Lucene-based metadata storage #50179

Add command-line tool support for Lucene-based metadata storage #50179

Conversation

ywelsch commented Dec 13, 2019

elasticmachine commented Dec 13, 2019

ywelsch commented Dec 16, 2019

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment