Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(custom-resources): Package does not exist #30067

Closed
athewsey opened this issue May 6, 2024 · 21 comments · Fixed by #31571
Closed

(custom-resources): Package does not exist #30067

athewsey opened this issue May 6, 2024 · 21 comments · Fixed by #31571
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/small Small work item – less than a day of effort p1

Comments

@athewsey
Copy link

athewsey commented May 6, 2024

Describe the bug

I'm trying to use AwsCustomResource from Python for a couple of actions on @aws-sdk/client-cognito-identity-provider, and deployment keeps failing with errors like:

Received response status [FAILED] from custom resource. Message returned:
Package @aws-sdk/client-cognito-identity-provider does not exist. (RequestId: 99b79a89-1a17-4acf-864c-84b3ac3e5664)

Expected Behavior

The affected resource (see repro steps below) should deploy successfully and create a user in the provided Cognito user pool.

Current Behavior

I'm getting the above mentioned error message and the resource fails to create (or rollback/delete). Also tried providing the service name as CognitoIdentityServiceProvider but this gave the same error message (with @aws-sdk/client-cognito-identity-provider package name)

Possibly this may be intermittent, as I managed to get the stack to deploy (update existing to add this resource) at least once? But now facing the error consistently.

Reproduction Steps

Given Python CDK construct with a resource something like:

AwsCustomResource(
    self,
    "AwsCustomResource-CreateUser",
    on_create=AwsSdkCall(
        action="adminCreateUser",
        parameters={
            "UserPoolId": ...,
            "Username": ...,
            "MessageAction": "SUPPRESS",
            "TemporaryPassword": ...,
        },
        physical_resource_id=PhysicalResourceId.of(
            f"AwsCustomResource-CreateUser-{...}"
        ),
        service="@aws-sdk/client-cognito-identity-provider",
    ),
    on_delete=AwsSdkCall(
        action="adminDeleteUser",
        parameters={
            "UserPoolId": ...,
            "Username": ...,
        },
        service="@aws-sdk/client-cognito-identity-provider",
    ),
    policy=AwsCustomResourcePolicy.from_sdk_calls(
        resources=AwsCustomResourcePolicy.ANY_RESOURCE
    ),
    install_latest_aws_sdk=True,
)

...Try to deploy the stack

Possible Solution

🤷‍♂️

Additional Information/Context

Originally observed on CDK v1.126.0, so tried upgrading to 2.140.0 but it didn't help.

CDK CLI Version

2.140.0

Framework Version

2.140.0

Node.js Version

20.9.0

OS

macOS 14.4.1

Language

Python

Language Version

Python 3.12.1

Other information

Seems possibly related to #28005, which was closed due to inactivity but raised against an older CDK version.

@athewsey athewsey added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 6, 2024
@github-actions github-actions bot added the @aws-cdk/custom-resources Related to AWS CDK Custom Resources label May 6, 2024
@glitchassassin
Copy link
Contributor

glitchassassin commented May 6, 2024

As of 11:00 EST on 5/3, we have been seeing a similar error with Python 3.10, CDK 2.134.0, using an AwsSdkCall for SSM's getParameter action. In our case the error is Package @aws-sdk/client-ssm does not exist.

cr.AwsCustomResource(
    self,
    "get_parameter",
    on_update=cr.AwsSdkCall(
        service="SSM",
        action="getParameter",
        parameters={
            "Name": parameter_name,
            "WithDecryption": True,
        },
        physical_resource_id=cr.PhysicalResourceId.of(
            str(datetime.utcnow()),
        ),
        region=region,
    ),
    policy=cr.AwsCustomResourcePolicy.from_sdk_calls(
        resources=[
            Stack.of(self).format_arn(
                service="ssm",
                region=region,
                resource="parameter",
                resource_name=parameter_name.lstrip("/"),
            )
        ]
    ),
)

The issue also appears to be intermittent for us.

@athewsey
Copy link
Author

athewsey commented May 6, 2024

For now, un-setting install_latest_aws_sdk seems to have stabilized our configuration (based on ~3 repeated deployments)... But I feel like it might be an intermittency thing / luck-of-the-draw, rather than a real remedy. Our full source code & patch commit available here

@glitchassassin it looks like you're not using the install_latest_aws_sdk option though right? And still seeing the issue?

@glitchassassin
Copy link
Contributor

Correct, we are not.

On Friday, it failed on 2/6 deploys. Today we've had four successful releases so far and no failures. I'm configuring logging on the AwsSdkCall in hopes of capturing more details if it happens again

@glitchassassin
Copy link
Contributor

glitchassassin commented May 6, 2024

Aha, tracked down some logs from Friday! They showed up by default in a Cloudwatch log group named /aws/lambda/[stack_name]-AWS[random hexadecimal]

Installing latest AWS SDK v3: @aws-sdk/client-ssm
Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: SyntaxError: Error parsing /tmp/node_modules/@smithy/shared-ini-file-loader/package.json: Unexpected end of JSON input

In another instance:

Installing latest AWS SDK v3: @aws-sdk/client-ssm
Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module '@smithy/shared-ini-file-loader'
Require stack:

  • /tmp/node_modules/@smithy/node-config-provider/dist-cjs/index.js
  • /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/adaptors/getEndpointFromConfig.js
  • /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/index.js
  • /tmp/node_modules/@smithy/core/dist-cjs/index.js
  • /tmp/node_modules/@aws-sdk/client-ssm/dist-cjs/index.js
  • /var/task/index.js
  • /var/runtime/index.mjs

It seems like each time this runs, there's an initial attempt to install the SDK which always times out after 120 seconds (based on ResourceProperties in the logs, InstallLatestAwsSdk is true even though it isn't explicitly set in our code). The lambda is immediately invoked again, and this time the install either succeeds or fails in under a minute. If it fails, it says it is falling back to pre-installed version.

After the install, an Update request is logged, and it returns the parameter it's supposed to be fetching correctly (whether the install failed or succeeded).

Then, in some cases, there is a second Update request in the logs a couple minutes later, and that is where the "Package does not exist" error gets thrown. The request is identical to the first Update request except that the physicalResourceId is different (it's using the current date/time as described here.)

After reviewing our deployment logs, this seems to only have happened when we had back-to-back deployments within a couple minutes of each other, so the second deployment's Update request hits the same running lambda instance that was created by the first deployment.

It looks like when the Lambda doesn't get cleaned up after an install failure, the next Update request fails.

@glitchassassin
Copy link
Contributor

glitchassassin commented May 6, 2024

Based on this:

function installLatestSdk(packageName: string): void {
console.log(`Installing latest AWS SDK v3: ${packageName}`);
// Both HOME and --prefix are needed here because /tmp is the only writable location
execSync(
`NPM_CONFIG_UPDATE_NOTIFIER=false HOME=/tmp npm install ${JSON.stringify(packageName)} --omit=dev --no-package-lock --no-save --prefix /tmp`,
);
installedSdk = {
...installedSdk,
[packageName]: true,
};
}
interface AwsSdk {
[key: string]: any;
}
async function loadAwsSdk(
packageName: string,
installLatestAwsSdk?: 'true' | 'false',
) {
let awsSdk: AwsSdk;
try {
if (!installedSdk[packageName] && installLatestAwsSdk === 'true') {
try {
installLatestSdk(packageName);
// MUST use require here. Dynamic import() do not support importing from directories
// esbuild-disable unsupported-require-call -- not esbuildable but that's fine
awsSdk = require(`/tmp/node_modules/${packageName}`);
} catch (e) {
console.log(`Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: ${e}`);
// MUST use require as dynamic import() does not support importing from directories
// esbuild-disable unsupported-require-call -- not esbuildable but that's fine
return require(packageName); // Fallback to pre-installed version
}

I wonder if the initial npm install failure is leaving /tmp/node_modules in an invalid state, but a subsequent npm install fails to detect the issue and thinks everything is installed?

Nope! It's actually failing on the require, not on the npm install command. So at this point installedSdk[packageName] is true. Next time it runs, the handler skips trying to install and falls through to the next block on the if statement:

} else if (installedSdk[packageName]) {
// MUST use require here. Dynamic import() do not support importing from directories
// esbuild-disable unsupported-require-call -- not esbuildable but that's fine
awsSdk = require(`/tmp/node_modules/${packageName}`);
} else {
// esbuild-disable unsupported-require-call -- not esbuildable but that's fine
awsSdk = require(packageName);
}

But there's no try/catch here, so this time when the require fails, it doesn't fall back to the pre-installed version.

@glitchassassin
Copy link
Contributor

Drafting a PR with a fix

@khushail khushail added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels May 7, 2024
@khushail khushail added p2 effort/small Small work item – less than a day of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels May 8, 2024
@khushail
Copy link
Contributor

khushail commented May 8, 2024

thanks @athewsey for reporting this issue. There have been multiple incidences of this issue reported by the customers recently

Thanks @glitchassassin for submitting a PR.

@ofiriluz
Copy link

Hi, any update on this?
we have started getting this as well when deleting Events Rule for some reason
"Package @aws-sdk/client-cloudwatch-logs does not exist"

This is holding our pipelines right now from fully passing

@glitchassassin
Copy link
Contributor

Waiting on some guidance on the failing integration tests on the PR - I'm not sure how to resolve the build issues

@i-am-gg
Copy link

i-am-gg commented May 27, 2024

@glitchassassin I see that the PR is still open, this is also affecting our deployments, when is this expected to get merged ? And is there any workaround for the same for now ?

@glitchassassin
Copy link
Contributor

@gg-safe I am still working on getting this merged!

I think the workaround for now is to set install_latest_aws_sdk to false

@i-am-gg
Copy link

i-am-gg commented May 29, 2024

Thanks for the workaround @glitchassassin , this seems to be working, will test more.
Thanks a lot again !!!

@emmanuelnk
Copy link

emmanuelnk commented Jun 5, 2024

Hi, is there any movement on this? Our deployments (with custom resources) are failing for the exact same issue. In my case they fail regardless of the value of install_latest_aws_sdk with the following message:

 Received response status [FAILED] from custom resource. Message returned: Package @aws-sdk/client-r53 does not exist.

@glitchassassin
Copy link
Contributor

I've been working through the PR issues with pahud in the CDK Slack; I've cross-posted the latest question on the PR for visibility

@emmanuelnk
Copy link

Thank you @glitchassassin -- if there is anything I can do to help move this along just link me to the slack discussion (I'm also in that slack group)

@ethanr-bjss
Copy link

Hi @glitchassassin, has there been any progress on this? This is currently blocking some of our Production workflows.

If there's anything I can help with to speed this along, please let me know.

@glitchassassin
Copy link
Contributor

@ethanr-bjss It looks like the PR that I was waiting on has been merged, so we should be good to update the integration test snapshots. I'll get started on those now!

@anubhav-pandey1
Copy link

+1 to the issue, I am facing this issue or probably a similar issue for @aws-sdk/client-elasticloadbalancingv2 on creating a custom resource.

Error message: Package @aws-sdk/client-elasticloadbalancingv2 does not exist. (RequestId: 8e210cb-fbc4-41fe-89cd-bebf8b93d075)

@GavinZZ
Copy link
Contributor

GavinZZ commented Oct 1, 2024

Updated the issue to p1 as there's no clear workaround for this problem.

@mergify mergify bot closed this as completed in #31571 Oct 1, 2024
@mergify mergify bot closed this as completed in 00cdbcb Oct 1, 2024
Copy link

github-actions bot commented Oct 1, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

github-actions bot commented Oct 1, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. effort/small Small work item – less than a day of effort p1
Projects
None yet
9 participants