Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add STOPPED to the failure cases for Sagemaker Training Jobs #42423

Merged
merged 2 commits into from
Sep 24, 2024

Conversation

ferruzzi
Copy link
Contributor

User pointed out that Airflow will mark a failed Sagemaker Training Job as a successful task. Looking into the issue, Sagemaker treats "Stopped" as a failure state for this endpoint, not a successful terminal state. [Sagemaker Docs here]

To reproduce:

Run a DAG with a SagemakerTrainingOperator task. Once the training starts, open the AWS console and stop the job. Airflow will see that the job has ended but mark the task as successful.

After this change, those steps will show the task marked as failed as expected.

@boring-cyborg boring-cyborg bot added area:providers area:UI Related to UI/UX. For Frontend Developers. provider:amazon-aws AWS/Amazon - related issues labels Sep 23, 2024
@ferruzzi ferruzzi force-pushed the ferruzzi/sagemaker-training-failure branch from c73d0d3 to 51de3b1 Compare September 23, 2024 20:56
@jscheffl
Copy link
Contributor

Oh, I have so little knowledge about AWS and Sagemaker, feel afraid to judge and approve this thing :-(

@ferruzzi
Copy link
Contributor Author

Oh, I have so little knowledge about AWS and Sagemaker, feel afraid to judge and approve this thing :-(

It's a very large ball of wax, that's for sure. I'm not entirely sure why you were auto-added to the reviewers list, I didn't intentionally tag you.

@jscheffl jscheffl removed their request for review September 23, 2024 21:13
@ferruzzi ferruzzi merged commit ab3429c into apache:main Sep 24, 2024
54 checks passed
@ferruzzi ferruzzi deleted the ferruzzi/sagemaker-training-failure branch September 24, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:UI Related to UI/UX. For Frontend Developers. provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants