Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Persist progress when setting DFA task to failed #61782

Conversation

dimitris-athanasiou
Copy link
Contributor

When an error occurs and we set the task to failed via
the DataFrameAnalyticsTask.setFailed method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.

When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@dimitris-athanasiou dimitris-athanasiou merged commit 2ba4e15 into elastic:master Sep 1, 2020
@dimitris-athanasiou dimitris-athanasiou deleted the persist-progress-on-failure branch September 1, 2020 13:42
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Sep 1, 2020
…1782)

When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.

Backport of elastic#61782
dimitris-athanasiou added a commit that referenced this pull request Sep 1, 2020
…61792)

When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.

Backport of #61782
dimitris-athanasiou added a commit that referenced this pull request Sep 2, 2020
…61797)

When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.

This commit solves this bug by persisting the progress before
setting the task to failed.

Backport of #61782
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Sep 2, 2020
This fixes a bug introduced by elastic#61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes elastic#61852
dimitris-athanasiou added a commit that referenced this pull request Sep 2, 2020
…61868)

This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes #61852
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Sep 2, 2020
…ess (elastic#61868)

This fixes a bug introduced by elastic#61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes elastic#61852

Backport of elastic#61868
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Sep 2, 2020
…ess (elastic#61868)

This fixes a bug introduced by elastic#61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes elastic#61852

Backport of elastic#61868
dimitris-athanasiou added a commit that referenced this pull request Sep 2, 2020
…ess (#61868) (#61875)

This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes #61852

Backport of #61868
dimitris-athanasiou added a commit that referenced this pull request Sep 2, 2020
…ess (#61868) (#61876)

This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes #61852

Backport of #61868
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants