Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Model state docs are orphaned in .ml-state after job is deleted #30551

Closed
richcollier opened this issue May 12, 2018 · 4 comments
Closed
Assignees
Labels
>bug :ml Machine learning

Comments

@richcollier
Copy link
Contributor

First reported on the discuss forum: https://discuss.elastic.co/t/ml-state-is-too-big/131561

I happened to look at my own setup and did also notice a few model states that belonged to jobs that no longer existed in my system.

image

There is no current job in my system called test_kpi - although I'm sure at one time there was and it was deleted.

image

Not sure what version is being used on the user on Discuss, but I'm currently using v6.2.0

@richcollier richcollier added the :ml Machine learning label May 12, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@dimitris-athanasiou
Copy link
Contributor

@richcollier Can you see any model_snapshot documents in the results index with job_id set to test_kpi?

@richcollier
Copy link
Contributor Author

@dimitris-athanasiou - there are no documents in .ml-anomalies for job_id:test_kpi

@dimitris-athanasiou
Copy link
Contributor

dimitris-athanasiou commented May 15, 2018

We found the cause of this. It was a bug that was introduced in version 6.1. When we persist the model state, we persist the state documents in .ml-state index and a model_snapshot document in the results index. Later, in order to delete the state documents, we need to have the model snapshot doc. Due to the bug, during background periodic persistence, the state documents were persisted but the model snapshot document was put in a buffer. If the job was deleted from the UI before the buffer was flushed, the snapshot documents would never be indexed, meaning the state docs would be left behind after the job was deleted.

The above bug is resolved in 6.3.0 (and 6.2.5 if that version is ever released). However, in order to ensure those documents are deleted and to prevent such cases in the future, I will work on enhancing the daily maintenance service to look for left-behind state docs and clean them up.

dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this issue May 17, 2018
It is possible for model state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes elastic#30551
dimitris-athanasiou added a commit that referenced this issue May 17, 2018
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes #30551
dimitris-athanasiou added a commit that referenced this issue May 17, 2018
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes #30551
dimitris-athanasiou added a commit that referenced this issue May 17, 2018
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes #30551
ywelsch pushed a commit to ywelsch/elasticsearch that referenced this issue May 23, 2018
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes elastic#30551
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants