Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM auth: session token eventually expires #233

Closed
yankovs opened this issue Oct 29, 2023 · 3 comments · Fixed by #234
Closed

IAM auth: session token eventually expires #233

yankovs opened this issue Oct 29, 2023 · 3 comments · Fixed by #234

Comments

@yankovs
Copy link
Contributor

yankovs commented Oct 29, 2023

Hey!

I noticed an issue in the IAM auth feature I made a PR for a while back. Real world kartons are supposed to be long-running services and shouldn't crash, so essentially KartonBackend should be initialized once in their startup process. This means that the s3 client will use the same session_token and eventually it will expire and cause the karton to fail processing tasks 100% of the time.

There are a couple of ways to deal with it I think are worth discussion:

  • Make s3 client on each request: In mwdb, when a file object's get_or_create is called, a new temporary client is made every time. This solves the problem because the session token doesn't have time to expire. Similarly, we can move the s3 client creation to a separate method:
class KartonBackend:
    def __init__(
        self,
        config: Config,
        identity: Optional[str] = None,
        service_info: Optional[KartonServiceInfo] = None,
    ) -> None:
        self.config = config

        if identity is not None:
            self._validate_identity(identity)
        self.identity = identity

        self.service_info = service_info
        self.redis = self.make_redis(
            config, identity=identity, service_info=service_info
        )

        # don't use this
        # self.s3 = ...

    def make_s3(self):
        session_token = None
        endpoint = self.config.get("s3", "address")
        access_key = self.config.get("s3", "access_key")
        secret_key = self.config.get("s3", "secret_key")
        iam_auth = self.config.getboolean("s3", "iam_auth")

        if not endpoint:
            raise RuntimeError("Attempting to get S3 client without an endpoint set")
        ...

	 def get_object(self, bucket: str, object_uid: str) -> HTTPResponse:
        """
        Get resource object stream with the content.

        Returned response should be closed after use to release network resources.
        To reuse the connection, it's required to call `response.release_conn()`
        explicitly.

        :param bucket: Bucket name
        :param object_uid: Object identifier
        :return: Response object with content
        """
        return self.make_s3().get_object(Bucket=bucket, Key=object_uid)["Body"]

However, this can potentially create overhead and reduce the performance because of the constant creation of the client.

  • Implement credential refresh mechanism: Tools to do this should already exist in botocore, but this is more complicated overall. One thing that comes to mind is to use botocore's existing codebase to check if credentials need a refresh before every call to process method of any karton. So essentially:
# where the process() method of a karton is
# dispatched. Probably in karton system?
# karton_obj is some karton class instance
if check_token_refresh(karton_obj):
    refresh_token(karton_obj)
karton_obj.process(task)
@yankovs
Copy link
Contributor Author

yankovs commented Oct 29, 2023

It seems like the second bullet might be the easiest, actually. Both InstanceMetadataProvider and ContainerProvider return a RefreshableCredentials object when calling load. So what we want should be already achievable as is. It is just that we throw away this object and statically use the creds it provides once

Instead, I believe this creds object should just be used to construct a session and then use self.s3 = session.client("s3") and then use the current codebase as is :). I'll experiment with this when I can.

@nazywam
Copy link
Member

nazywam commented Oct 30, 2023

Second options sounds good! Do you maybe know if this happens only with AWS or are other possible backends are affected as well?

@yankovs
Copy link
Contributor Author

yankovs commented Oct 30, 2023

Second options sounds good! Do you maybe know if this happens only with AWS or are other possible backends are affected as well?

Hey :)

I didn't test this with any other backend than ours, which is basically AWS ECS Fargate. I don't think I have the tools to test any other backend right now but if someone is willing to I'll be happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants