Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create action to migrate the contents of one index to a new index #20024

Closed
wants to merge 18 commits into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 17, 2016

The standard way to change an index's mapping is to create a new index with the
new mapping, _reindex the documents into the new index, flip the alias from
the old index to the new index, and then remove the old index. Traditionally
this sort of thing has been left as an exercise for those implementing an
application against Elasticsearch but I think now is the time to implement this
in Elasticsearch because:

  1. Kibana, Watcher and Security need to run this process as part of upgrading to 5.0.
  2. Elasticsearch 5.0 now has the .tasks index for storing the results of
    tasks long running. While we were fairly careful in designing its mappings,
    I'm under no illusion that we got it right the first try. That just isn't the
    way software works. We're going to want to run this on .tasks one day.
  3. Logstash is considering storing configuration in an Elasticsearch index and
    handling upgrades to the format of the data is a concern for Logstash's
    engineers.

In all of these cases the indexes are implementation details of their
application so we'd like to automatically upgrade them on startup rather than
provide upgrade scripts. That means that the application will want to migrate
its data every time it starts up so a user only has to get involved if the data
migration fails.

3 of the 5 applications that will need to do this migration live inside
Elasticsearch (Watcher and Security are a plugin, .tasks is in core
Elasticsearch). So it looks like the right place to implement this is in core
Elasticsearch. The other advantage of implementing it there is that it can be
used by the widest range of users.

This PR intends to build an action into core Elasticsearch that:

  1. Responds quickly with 200 OK when the index is in the desired state
    already.
  2. Waits on concurrent invocations of the same request. This is especially
    important in "masterless" systems like Logstash so they can invoke this API on
    startup and not have to worry about one node "winning". They all get the same
    response.
  3. Notices if previous executions of this request didn't complete properly and
    responds with that information rather than some cryptic failure message.
  4. Performs the create index, migrate documents, flip alias, delete source
    index steps.

It exposes it with an HTTP request that looks like:

POST /index_1/_migrate/index_2
{
  "settings": {...},
  "mapping": {...},
  "aliases": ["index"],
  "script": {
    "lang": "painless",
    "inline": "ctx._source.thing = 2"
  }
}

In this example index_1 is the source index and index_2 is the destination
index. Unlike a normal create index command the aliases section is required.
This is how _migrate knows that the process is complete and it is a good
practice anyway. The alias is added to the destination index after all the docs
in the source index are migrated to the destination index and the destination
index has been _refreshed so they are visible.

Like _reindex and _delete_by_query and _update_by_query, these requests
are "big" in that they do many things and we expect them to take a long time if
they operate on a large number of documents. This can't be helped so we want to
make sure that this request integrates well with the task management API. That
means that it should be "cancellable": true and it's status should be super
expressive, returning the phase of the operation currently being performed and
if that phase is reindex then it needs to return the details of the reindex's
status.

We try to limit the number of "big" operations in core Elasticsearch because
every one of them feels like a new trap we are setting for unsuspecting users.
We will need to warn users that this can take some time and put some load on
the cluster. For the users all the way at the top of the document we don't
expect this to be a problem though. A Security index with a million documents
is huge but not a ton of work for reindex. We just have to make very very
sure that it is obvious to users that doing this against an index with a
hundred million documents is going to take a long time.

@nik9000 nik9000 added discuss WIP :Data Management/Indices APIs APIs to create and manage indices and templates v5.0.0-beta1 labels Aug 17, 2016
@nik9000 nik9000 changed the title Index migrate Create action to migrate the contents of one index to a new index Aug 17, 2016
@nik9000
Copy link
Member Author

nik9000 commented Aug 17, 2016

This is currently a very rough WIP. I'd mostly like to get feedback on the general direction before I go too deep down a rabbit hole.

`#equals` isn't quite right, so we make something better. And this time
we test it.
You can't reuse requests in different threads or they'll be modified
by different threads without any proper synchronization. And we check
that the request isn't modified in unexpected ways.
@nik9000
Copy link
Member Author

nik9000 commented Sep 13, 2016

Sorry for leaving this open for so long. A few of us talked verbally and, while this operation would be useful for some folks, it really wouldn't be useful for upgrading indexes on startup. The reasoning is that upgrading an index requires that the cluster be stable for the duration of the upgrade and cluster startup is the time when the cluster is at its most unstable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates discuss WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants