-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Job fail to start with "Invalid alias name [.ml-state-write] ..." #58482
Comments
Pinging @elastic/ml-core (:ml) |
We have discussed this issue and decided to improve the log message. For the permanent fix we follow up on #55267. |
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias ML stops working. This change improves error handling by setting the job to failed and properly log and audit the problem. The user still has to manually fix the problem. This change should lead to a quicker resolution of the problem. fixes #58482
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias ML stops working. This change improves error handling by setting the job to failed and properly log and audit the problem. The user still has to manually fix the problem. This change should lead to a quicker resolution of the problem. fixes #58482
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias ML stops working. This change improves error handling by setting the job to failed and properly log and audit the problem. The user still has to manually fix the problem. This change should lead to a quicker resolution of the problem. fixes #58482
Maybe we should also consider the ILM behavior.
To work around this, we need to release the ILM settings from the index,
Thoughts? |
Yes, you are correct this is a problem. After deleting the concrete |
Thanks @droberts195 !! for the comment! all clear now! |
Affected version: 7.7 -
Problem
.ml-state-write
is supposed to be an index alias, however by accident it can become an index. If.ml-state-write
is a concrete index instead of an alias, starting a job can fail due to index rollover introduced in #52356.The reason for
.ml-state-write
being an index instead of an alias is explained in #57645From
7.9
the job fails with:Detected a problem with the internal machine learning data: the state index alias ... exists as index but must be an alias.
Mitigation
.ml-state-write
.ml-state-write
to.ml-state
:After the successful reindex, delete the old index and create an alias:
Now you should be able to start the jobs.
Solution
The issues #57645 and #55267 discuss solutions for preventing the
.ml-state-write
index. This will solve the root cause of this issue.For users that have an
.ml-state-write
index by mistake, this won't help. Because reindex is an expensive operation it's not an option to reindex in the back.2 possible improvements I can think of:
A: improve log message
The log message isn't very descriptive and does not help for finding a solution quickly. We can improve the message (concrete wording to be discussed): "Expected [.ml-state-write] to be an alias but it is an index, can't start the job. Please reindex [.ml-state-write] to [.ml-state]". It's not possible to write full instructions in a log message, but given the message is part of this, users should find this.
B: do not use ILM if ml-state-write is an indexWe could be lenient and simply fall back to the old non-ILM way. We added ILM for a reason, that's why this solution is questionable, however, we talk about7.x
. For upgrading to8.0
we can require using an update tool and reindex as part of migrating to8
, so eventually the state index will be managed. This solution requires that a.ml-state-write
index does not cause problems in other parts of the code.The text was updated successfully, but these errors were encountered: