Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Deleting open job's results index causes recreation with write alias name #57645

Closed
droberts195 opened this issue Jun 4, 2020 · 4 comments
Labels
:ml Machine learning

Comments

@droberts195
Copy link
Contributor

Deleting the results index of an anomaly detection job while it is running is not expected or supported. However, if it is done then this is what happens:

  • Because the results writes are being made via the write alias, a concrete index named after the write alias is auto-created
  • Because reads of results are done via the read alias, subsequent reads of results for the job find no results

So, if a job appears to be running but no results for it exist then it is worth checking if deletion of the results index is the cause. The simplest place to look is the output from _cat/indices - the concrete index for the job named after the write alias will look strange compared to the other ML indices listed in the output.

The next question is whether we could do anything to fail fast in this situation.

One idea that has been previously suggested is to add a ?alias_required argument to index requests that would fail the request if the write was not being made via an alias. This would be very helpful for ML.

Another possibility that doesn't require any core changes is that we could check the responses immediately after indexing anomaly results. The response to an index or bulk request says which index it was indexed into, even if this was specified as an alias in the request. Since we know we are supposed to be indexing via an alias we could fail the job if the index contained in the response is identical to the name we supplied in the index or bulk request.

Another possibility would be to intercept delete index requests using a filter client and reject any for indices with names beginning .ml that didn't come from the _xpack user. This could however be dangerous, as it would make it hard to recover from corruption caused by future bugs. So this is probably the least desirable of the 3 options.

@droberts195 droberts195 added the :ml Machine learning label Jun 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor Author

droberts195 commented Jun 9, 2020

I just realised this is hugely related to #55267, which suggests an alternative solution.

@droberts195
Copy link
Contributor Author

#58917 provides the building block to fix this problem. Now we can move on to add the require_alias=true flag to all our state and results writes.

@droberts195
Copy link
Contributor Author

Fixed by #60315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

No branches or pull requests

2 participants