Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ec2): instance resourceSignalTimeout overwrites initOptions.timeout #31446

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

michelle-wangg
Copy link
Contributor

@michelle-wangg michelle-wangg commented Sep 13, 2024

Issue # (if applicable)

Closes #30052

Reason for this change

When specifiying both resourceSignalTimeout and initOptions.timeout in the options for creating an EC2 Instance, only the value from resourceSignalTimeout is used.

Description of changes

  • If both initOptions.timeout and resourceSignalTimeout are set, timeout consisting of the sum of the values and a warning suggesting that only one field should be specified
  • Else, timeout is set to field specified, else defaults to 5 min

Description of how you validated changes

  • Update integration test for instance-init.js with both fields and verify that warning is logged
  • Add unit tests:
    • initOptions.timeout and resourceSignalTimeout are both not set, timeout is set to default of 5 min
    • initOptions.timeout set and resourceSignalTimeout not set, timeout is set to initOptions.timeout
    • initOptions.timeout not set and resourceSignalTimeout is set, timeout is set to resourceSignalTimeout
    • initOptions.timeout and resourceSignalTimeout are both set, timeout is set to sum of timeouts and warning is logged

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@aws-cdk-automation aws-cdk-automation requested a review from a team September 13, 2024 22:11
@github-actions github-actions bot added the p2 label Sep 13, 2024
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Sep 13, 2024
@github-actions github-actions bot added bug This issue is a bug. effort/small Small work item – less than a day of effort labels Sep 13, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@michelle-wangg michelle-wangg changed the title fix(ec2): Instance resourceSignalTimeout overwrites initOptions.timeout fix(ec2): instance resourceSignalTimeout overwrites initOptions.timeout Sep 16, 2024
@aws-cdk-automation aws-cdk-automation dismissed their stale review September 16, 2024 17:14

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@michelle-wangg michelle-wangg marked this pull request as ready for review September 16, 2024 21:03
@aws-cdk-automation aws-cdk-automation added the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Sep 16, 2024
Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Michelle! Left some comments.

I see your fix would now be summing the two values up. This would mean that users (who haven't changed their template) would have the values updated in their next deployment if they upgrade their CDK version. This is generally a sign of breaking change, however, in this case it's extending the timeout value. I think this can be argued to be a non-breaking change. Can you include @moelasmar in the review and get his though?

packages/aws-cdk-lib/aws-ec2/lib/instance.ts Outdated Show resolved Hide resolved
packages/aws-cdk-lib/aws-ec2/lib/instance.ts Outdated Show resolved Hide resolved
packages/aws-cdk-lib/aws-ec2/lib/instance.ts Outdated Show resolved Hide resolved
@aws-cdk-automation aws-cdk-automation removed the pr/needs-maintainer-review This PR needs a review from a Core Team Member label Sep 16, 2024
timeout: props.resourceSignalTimeout && props.resourceSignalTimeout.toIsoString(),
...this.instance.cfnOptions.creationPolicy?.resourceSignal,
timeout: props.initOptions?.timeout ?
props.resourceSignalTimeout.plus(props.initOptions?.timeout).toIsoString() :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check this for me: i don't think you need the ? on this line because typescript should be smart enough to know props.initOptions must exist to be on this side of the ternary operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! I have updated this.

Comment on lines 714 to 715
props.resourceSignalTimeout.plus(props.initOptions.timeout).toIsoString() :
props.resourceSignalTimeout.toIsoString(),
Copy link
Contributor

@moelasmar moelasmar Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although it looks safe to add the 2 numbers, but this may cause a breaking change for customers if the sum of the 2 numbers is more than 12 hours .. this should be a very rare case, but may be some customers set one of these numbers to the maximum value, and since we ignore the other one, it was working for them, but now it will stop working.

Also, I can see that the original intent was to sum these 2 numbers together based on this code

private waitForResourceSignal(timeout: Duration) {
const oldResourceSignal = this.instance.cfnOptions.creationPolicy?.resourceSignal;
this.instance.cfnOptions.creationPolicy = {
...this.instance.cfnOptions.creationPolicy,
resourceSignal: {
count: (oldResourceSignal?.count ?? 0) + 1,
timeout: (oldResourceSignal?.timeout ? Duration.parse(oldResourceSignal?.timeout).plus(timeout) : timeout).toIsoString(),
},
};
That is why I think summing the 2 numbers is still Ok, but we should do it using a feature flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing it is not needed to do this change here, it is enough to change this snippet

if (props.init) {
this.applyCloudFormationInit(props.init, props.initOptions);
}
this.applyUpdatePolicies(props);
to move this.applyUpdatePolicies(props); before the if condition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that it may cause breaking changes when the sum is over the max threshold. Thanks for pointing that out.

For the second point, I disagree that we put it inside if statement because users could choose not to specify props.init and still setup the timeout value. Now if we move it inside, it won't happen anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did not mean to put it inside the if condition, what I recommend is something like this

this.applyUpdatePolicies(props); 

if (props.init) { 
   this.applyCloudFormationInit(props.init, props.initOptions); 
 } 

and to remove the addition changes that @michelle-wangg added to the applyUpdatePolicies method and depend that the addition logic is already exist in applyCloudFormationInit method some how:

public applyCloudFormationInit(init: CloudFormationInit, options: ApplyCloudFormationInitOptions = {}) {
init.attach(this.instance, {
platform: this.osType,
instanceRole: this.role,
userData: this.userData,
configSets: options.configSets,
embedFingerprint: options.embedFingerprint,
printLog: options.printLog,
ignoreFailures: options.ignoreFailures,
includeRole: options.includeRole,
includeUrl: options.includeUrl,
});
this.waitForResourceSignal(options.timeout ?? Duration.minutes(5));
}
/**
* Wait for a single additional resource signal
*
* Add 1 to the current ResourceSignal Count and add the given timeout to the current timeout.
*
* Use this to pause the CloudFormation deployment to wait for the instances
* in the AutoScalingGroup to report successful startup during
* creation and updates. The UserData script needs to invoke `cfn-signal`
* with a success or failure code after it is done setting up the instance.
*/
private waitForResourceSignal(timeout: Duration) {
const oldResourceSignal = this.instance.cfnOptions.creationPolicy?.resourceSignal;
this.instance.cfnOptions.creationPolicy = {
...this.instance.cfnOptions.creationPolicy,
resourceSignal: {
count: (oldResourceSignal?.count ?? 0) + 1,
timeout: (oldResourceSignal?.timeout ? Duration.parse(oldResourceSignal?.timeout).plus(timeout) : timeout).toIsoString(),
},
};
}

See the login in waitForResourceSignal, by this change we do not need to duplicate the addition logic in 2 places.

For the warning message, I also prefer to move it to the constructor after the validations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I have updated the implementation to use a feature flag. Since I have not changed the logic waitForResourceSignal, even if initOptions.Timeout is not specified and resourceSignalTimeout is specified, the timeout will add the default 5 min to the resourceSignalTimeout.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 6024e98
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -589,11 +590,23 @@ export class Instance extends Resource implements IInstance {
this.instancePublicDnsName = this.instance.attrPublicDnsName;
this.instancePublicIp = this.instance.attrPublicIp;

if (props.init) {
this.applyCloudFormationInit(props.init, props.initOptions);
if (props.initOptions?.timeout && props.resourceSignalTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if the feature flag is enabled, so no need for this warning, as this is our recommended behaviour.

@@ -581,8 +581,7 @@
],
"CreationPolicy": {
"ResourceSignal": {
"Count": 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Count should stay in the template

Copy link
Contributor

@moelasmar moelasmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left minor comments, and you need to rerun the integration test cases with --update-on-failed flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(aws-ec2): Instance resourceSignalTimeout overwrites initOptions.timeout
5 participants