Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing svc deploy caused by nested stack changeset limit #5614

Closed
raethlo opened this issue Jan 17, 2024 · 6 comments
Closed

Failing svc deploy caused by nested stack changeset limit #5614

raethlo opened this issue Jan 17, 2024 · 6 comments
Labels
area/addon Issues about addons. area/deployment Issues related to deployments guidance Issue requesting guidance or information about usage stale type/enhancement Issues that are improvements for existing features. type/request Issues that are created by customers.

Comments

@raethlo
Copy link

raethlo commented Jan 17, 2024

Hi πŸ‘‹πŸ» We're facing an issue where our copilot cli managed fargate deployment (load balanced svc) fails with:

> copilot svc deploy --diff-yes --name $COPILOT_API_SVC --env $COPILOT_ENV --force
✘ deploy service api to environment staging: deploy service: wait for creation of change set copilot-995e5c10-aea9-43a1-b3fb-fd9182f6a95e for stack staging-api: ResourceNotReady: failed waiting for successful resource state: ChangeSet limit exceeded for stack arn:aws:cloudformation:us-east-1:000000000000:stack/staging-api-AddonsStack-115OY9YZK7MZZ/cbc47f70-f082-11ec-bcd3-0e3807434699: Resource creation cancelled

The issue is there is a lot of "failed changesets" for the nested addon stack with status: "The submitted information didn't contain changes. Submit different information to create a change set.". The nested addon stack rarely has a reason to change since it just contains storage and some roles but the buildup of failed changesets causes us to hit some cfn quota limit (I think). The only pointer I have been able to find as to why it happens so far is in this aws-cli issue aws/aws-cli#4534.

So far I have not found a way how to delete the offending changesets to unblock the release because the delete commands fail with:

An error occurred (ValidationError) when calling the DeleteChangeSet operation: Nested change set must be deleted from root change set

Apparently as pointed out in aws/aws-cli#4534 (comment) the only way to remove the changesets is to create a change from the root stack that updates the nested resource, however I am unsure how to safely do this change so that I won't create further problems with how copilot tracks the resources.

Can you please advise on what's the best way to recover the stacks to a healthy state? Since the nested addon stack will rarely contain any changes it seems that the failed changesets will build up indefinitely, is there a way to automatically clean up the failed changesets that don't contain changes?

Thanks πŸ™πŸ»

@Lou1415926
Copy link
Contributor

Hi @raethlo! Apologies for the trouble 😞. I went ahead and tested this, and also found myself trapped in a loop:

  1. Tried to delete the nested stack's change set ➑️ Nested change set must be deleted from root change set (same as yours).
  2. Tried to delete the root change set ➑️ Cannot delete ChangeSet in execution status EXECUTE_COMPLETE.

Meaning that there is no way to delete the nested stack's change set, if the root change set happens to be in a good state.

The only workaround that I can think of right now is like what you've linked, unfortunately 😞, to create a changeset that actually does update the nested stack. For example, you can update the Tags property of a resource by adding a dummy tag, and then removing it later on. Be sure to double check by going to a doc page like this one to make sure that updating the Tags property does not trigger a "Replacement". From what I see and understand, it should be "No interruption" for the majority of AWS resources, but please double check to make sure.

is there a way to automatically clean up the failed changesets that don't contain changes?

From what I tested above, because of the error Cannot delete ChangeSet in execution status EXECUTE_COMPLETE, I don't see a good way for Copilot to handle this either: Copilot could have attempted to delete any old change sets, but the attempts would have failed anyway because of the said error. I am reaching out to the CloudFormation team to understand this issue better. There is also the possibility that they know a better workaround than what I have - I'll update this thread to let you know if we can find a new workaround. Apologies for the issue 😞 !!

@raethlo
Copy link
Author

raethlo commented Jan 18, 2024

Hey @Lou1415926 thanks for looking into this πŸ™πŸ» I'll try to create a change that updates the nested stack and will circle back and post if it worked or not. Curious to see if the cfn team will have a better workaround.

@Lou1415926 Lou1415926 added the guidance Issue requesting guidance or information about usage label Jan 19, 2024
@Lou1415926
Copy link
Contributor

@raethlo yeah let me know how it goes! I tested the workaround myself yesterday, and it was successful in my case.

I've discussed the issue with the engineers from cfn, and the workaround that we reached was similar to what I suggested above. Instead of altering the Tags property of some resource, you could also try adding a 'temporary' stack-level tag, from the parent stack as the only change. This change will be propagate to the nested stacks.
Screenshot 2024-01-19 at 10 27 03β€―AM

@Lou1415926 Lou1415926 added type/enhancement Issues that are improvements for existing features. area/deployment Issues related to deployments area/addon Issues about addons. labels Jan 19, 2024
@raethlo
Copy link
Author

raethlo commented Jan 19, 2024

@Lou1415926 what ended up working for us after some trial & error was adding a dynamic resource tag to the deploy.

 copilot svc deploy --name api --env staging --resource-tags 'release-version=main-8ed091d-7576227310' --force

this unblocked the cfn and also cleaned up the built up failed changesets. I don't know how common the issue is (it seems weird to me that it wasn't reported before, so it might be caused by sth on our end) but if there is no downside to doing so, copilot could automatically tag managed resources on deploy to avoid hitting the limit.

Anyways, thanks for the help πŸ™πŸ»

@Lou1415926 Lou1415926 added the type/request Issues that are created by customers. label Jan 20, 2024
@Lou1415926 Lou1415926 changed the title Failing svc deploy caused by cfn changeset limit Failing svc deploy caused by nested stack changeset limit Jan 20, 2024
Copy link

This issue is stale because it has been open 60 days with no response activity. Remove the stale label, add a comment, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Mar 21, 2024
Copy link

github-actions bot commented Apr 4, 2024

This issue is closed due to inactivity. Feel free to reopen the issue if you have any further questions!

@github-actions github-actions bot closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addon Issues about addons. area/deployment Issues related to deployments guidance Issue requesting guidance or information about usage stale type/enhancement Issues that are improvements for existing features. type/request Issues that are created by customers.
Projects
None yet
Development

No branches or pull requests

2 participants