Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rds: UPDATE_ROLLBACK_FAILED when deploy while "storage change is being optimized" #29916

Closed
anentropic opened this issue Apr 21, 2024 · 6 comments
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. effort/medium Medium work item – several days of effort needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2

Comments

@anentropic
Copy link

Describe the bug

I tried to deploy a change to the allocated storage of my RDS, but it turns out the db was already in the process of autoscaling the storage

This resulted in the following error:

11:19:27 | UPDATE_FAILED        | AWS::RDS::DBInstance                        | DatabaseMySQLADA28B0A
Resource handler returned message: "You can't currently modify the storage of this DB instance because the previous storage change is being
optimized. (Service: Rds, Status Code: 400, Request ID: 47f80e3c-545e-4094-b31f-47267d87956e)" (RequestToken: d0baaf92-da6d-b0f9-258a-131f90
3fc67f, HandlerErrorCode: InvalidRequest)

and

UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [DatabaseMySQLRotationSingleUser27AA2177, DatabaseMySQLADA28B0A]. )
    at FullCloudFormationDeployment.monitorDeployment (/Users/anentropic/.nvm/versions/node/v18.18.0/lib/node_modules/aws-cdk/lib/index.js:430:10615)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Object.deployStack2 [as deployStack] (/Users/anentropic/.nvm/versions/node/v18.18.0/lib/node_modules/aws-cdk/lib/index.js:433:200503)
    at async /Users/anentropic/.nvm/versions/node/v18.18.0/lib/node_modules/aws-cdk/lib/index.js:433:18134

Expected Behavior

It's ok for the deploy to fail but it is unacceptable to end up in UPDATE_ROLLBACK_FAILED state

if the deploy has to fail it should be possible to retry the deployment after the db has finished its storage task

(I tried deploy again later but I just get Error [ValidationError]: Stack:arn:aws:cloudformation:eu-west-1:570110252051:stack/ifm-ssa-reimport-dev-eu/a5bd21a0-ff31-11ee-854d-0a9cdce91997 is in UPDATE_ROLLBACK_FAILED state and can not be updated.)

Current Behavior

stack is in UPDATE_ROLLBACK_FAILED state

AFAICT there is no way to recover from this state so my only option is to destroy the stack and start over

that is not possible on a production deployment

Reproduction Steps

deploy a stack with an RDS db, with a max_allocated_storage configured to enable autoscaling

add data to the db until it runs out of space

change the stack definition to increase allocated_storage then attempt the deployment while the db is busy autoscaling

Possible Solution

there are so many ways to end up in UPDATE_ROLLBACK_FAILED state with CDK that I feel there must be something fundamentally wrong with the underlying design

I can't recommend this to anybody

Additional Information/Context

No response

CDK CLI Version

2.138.0 (build 6b41c8b)

Framework Version

No response

Node.js Version

v18.18.0

OS

macOS 14.4.1

Language

Python

Language Version

3.11.5

Other information

No response

@anentropic anentropic added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 21, 2024
@github-actions github-actions bot added the @aws-cdk/aws-rds Related to Amazon Relational Database label Apr 21, 2024
@pahud
Copy link
Contributor

pahud commented Apr 22, 2024

I guess this is a CFN bug. I am cutting an internal ticket for clarifying.

@pahud
Copy link
Contributor

pahud commented Apr 22, 2024

internal tracking V1358326979

@pahud pahud self-assigned this Apr 22, 2024
@pahud pahud added needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Apr 22, 2024
@pahud
Copy link
Contributor

pahud commented May 7, 2024

Hi

It's not a bug of CFN. It's how CFN works when the deploy fails and the update handler will just try to fall back to previous state.

Please use continue-update-rollback to get rid of the stuck.

@pahud pahud removed their assignment May 7, 2024
@pahud
Copy link
Contributor

pahud commented May 7, 2024

closing as it's not a CDK issue.

@pahud pahud closed this as completed May 7, 2024
Copy link

github-actions bot commented May 7, 2024

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@aws-cdk-automation
Copy link
Collaborator

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

@aws aws locked as resolved and limited conversation to collaborators Jul 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/aws-rds Related to Amazon Relational Database bug This issue is a bug. effort/medium Medium work item – several days of effort needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2
Projects
None yet
Development

No branches or pull requests

3 participants