Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sagemaker: Support direct invocation of multi-container endpoints #23155

Open
1 of 2 tasks
petermeansrock opened this issue Nov 29, 2022 · 1 comment
Open
1 of 2 tasks
Labels
@aws-cdk/aws-sagemaker Related to AWS SageMaker effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p3

Comments

@petermeansrock
Copy link
Contributor

petermeansrock commented Nov 29, 2022

Describe the feature

As described in the SageMaker Endpoint L2 construct RFC:

Direct Invocation of Multi-Container Endpoints: By default (and as described in the proposed README), when a customer specifies multiple containers for a model, the containers are treated as an inference pipeline (also referred to as a serial pipeline). This means that the containers are treated as an ordered list, wherein the output of one container at runtime is passed as input to the next. Only the output from the last container is surfaced to the client invoking the model. To support a different invocation paradigm, the InferenceExecutionConfig structure was added to the model CloudFormation resource which allows customers to either explicitly configure Serial invocation mode (the default, as an inference pipeline) or the new Direct invocation mode. When using direct mode, a client invoking an endpoint must specify a container to target with their request; SageMaker then invokes only that single container.

Please 👍 this issue to help with the prioritization of this feature.

Use Case

"This enables you to run up to 15 different ML containers on a single endpoint and invoke them independently, thereby saving up to 90% in costs. These ML containers can be running completely different ML frameworks and algorithms for model serving." (link)

Proposed Solution

As described in the SageMaker Endpoint L2 construct RFC:

As SageMaker exposes a new dimension for CloudWatch metrics specific to each directly-invokable container, other than exposing a new inference execution mode attribute on the Model construct, this feature would likely also warrant the addition of a findContainer(containerHostName: string) method to IEndpointProductionVariant which will return a new interface on which additional metric* APIs are present for generating CloudWatch metrics against the dimension consisting of endpoint, variant, and container combined.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.54.0-alpha.0

Environment details (OS name and version, etc.)

macOS Ventura

@petermeansrock petermeansrock added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2022
@github-actions github-actions bot added the @aws-cdk/aws-sagemaker Related to AWS SageMaker label Nov 29, 2022
@peterwoodworth peterwoodworth added p2 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2022
@peterwoodworth
Copy link
Contributor

Thanks for documenting all the steps to fleshing out this module in individual issues @petermeansrock! We'll need to depend on contributor support to clear all these out, and this will be a big help in gaining that support

@madeline-k madeline-k removed their assignment Oct 30, 2023
@pahud pahud added p3 and removed p2 labels Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-sagemaker Related to AWS SageMaker effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p3
Projects
None yet
Development

No branches or pull requests

4 participants