Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to reuse token/transport to prevent repeated auth requests? #740

Closed
vito opened this issue Jul 14, 2020 · 7 comments
Closed

Comments

@vito
Copy link
Contributor

vito commented Jul 14, 2020

Hi there, thanks for developing this package - it's pretty great!

I'm using this package in the Concourse registry-image resource for listing tags and detecting changes to digests that tags point to - specifically, this PR: concourse/registry-image-resource#214

I noticed that when I point it to a repository that has many tags it can take a very long time to find everything. Judging from the debug logs, it looks like one thing contributing to this is that every remote.Image call is going through its own auth flow, probably from this call:

func makeFetcher(ref name.Reference, o *options) (*fetcher, error) {
tr, err := transport.New(ref.Context().Registry, o.auth, o.transport, []string{ref.Scope(transport.PullScope)})
if err != nil {
return nil, err
}

Is there any way to re-use the token and/or transport for multiple requests in order to speed this up, by skipping the ping + token request? I'm worried that sending so many requests could lead to being rate limited more quickly, though I'm not sure if rate limits apply to the auth flow or the ping call (maybe not).

Thanks!

@vito vito changed the title Constantly re-authenticates on every method call Is there a way to reuse token/transport to prevent repeated auth requests? Jul 14, 2020
@jonjohnsonjr
Copy link
Collaborator

jonjohnsonjr commented Jul 14, 2020

Hi there, thanks for developing this package - it's pretty great!

Thank you for the kind words.

listing tags and detecting changes to digests that tags point to

I know there's no way to avoid doing this because there's not a standard API for notifications, but this can be pretty painful for registries :P Joseph Schorr wrote a proposal for a new registry API to support this, which obviously you wouldn't be able to use anytime soon, but I wish there was a better way to do this. It would be wonderful if some software project existed to multiplex over the various pubsub implementations of major registries and expose a common API, but that might be an eternal pipedream 😄

Is there any way to re-use the token and/or transport for multiple requests in order to speed this up, by skipping the ping + token request?

This is definitely an annoying problem, and I don't know if there's an easy way to fix it. There's not currently a great way to work around this, but I did have one idea:

Everything in remote takes this WithTransport option. Normally, this gets passed through to transport.New as the "innermost" transport that gets wrapped by the token exchange logic.

It seems reasonable to me that we could do a type check in transport.New to see if the supplied transport is already a *bearerTransport and if so, just return it immediately instead of doing the token exchange.

This wouldn't have any change in existing behavior for current clients, but would allow you to create an http.Roundtripper via transport.New and reuse that everywhere with the remote.WithTransport option.

I like this approach because it's a really simple implementation, opt-in behavior, and doesn't change the API at all.

If you'd be willing to implement and test this out (to verify that it works as expected), it should only be a three line change in transport.New (+ some changes to how you call remote stuff).

Also if you have any better ideas for how you'd tackle this, I'm all ears.

@vito
Copy link
Contributor Author

vito commented Jul 14, 2020

Thanks for the quick reply!

I know there's no way to avoid doing this because there's not a standard API for notifications, but this can be pretty painful for registries :P Joseph Schorr wrote a proposal for a [...]

I'll check out the proposal! Concourse's resource model is polling-based as it's generally more resilient to things like missed events - granted, often at the expense of the external systems. I'll soon be working on a proposal to allow it to also leverage webhooks to reduce the amount of polling to only when a hook or event is received, so proposals like this one may become relevant there. Thanks!

The current approach has enough mitigating factors that I think it might not be too bad in practice:

  1. The check call that gets all the digests will only be getting digests for a subset of the tags: only ones containing a semver version number, optionally filtered down to the configured variant (e.g. -stretch suffix).
  2. Subsequent check calls will be given the latest version (from the previous check) as a "cursor", so we'll only fetch the digest values for the latest + any newer version tags. (It fetches the digest of the latest version just in case it was modified.)
  3. We could potentially skip the digest checking entirely if the user configures something like immutable: true. 🤔 I've heard some registries outright disallow pushing 'over' tags, so this would be a whole lot of load for nothing.
  4. Multiple Concourse pipelines configuring the exact same image resource definition will only correspond to one check loop, rather than one per pipeline, which should help prevent a big Concourse cluster from hammering a registry for a common image.

re: the transport remote.WithTransport idea, makes a lot of sense to me! I'll take a look and let you know if I have the capacity to submit a PR. 👍

@jonjohnsonjr
Copy link
Collaborator

I'll soon be working on a proposal to allow it to also leverage webhooks to reduce the amount of polling to only when a hook or event is received, so proposals like this one may become relevant there. Thanks!

This is interesting to me, please cc me!

The current approach has enough mitigating factors...

This is definitely better than most clients I've seen 😄

@vito
Copy link
Contributor Author

vito commented Jul 16, 2020

This is interesting to me, please cc me!

Sure! I'll aim to write something up next week; it's been getting near the top of my TODO list.

@jonjohnsonjr So, funny thing: this actually kind of already works without making any changes!

If I add this to my code:

rt, err := transport.New(repo.Registry, auth, http.DefaultTransport, []string{repo.Scope(transport.PullScope)})
if err != nil {
	return resource.CheckResponse{}, fmt.Errorf("resolve repository: %w", err)
}

opts = append(opts, remote.WithTransport(rt))

It still does the ping on every request, but the ping returns 200 because the request is already authenticated via the transport. The code then interprets this 200 as anonymous, which is maybe a bit hokey1, but this ultimately results in just returning t and avoiding the token request.

I looked into making the change you suggested, but didn't see a super clean way to detect whether to skip the ping. One problem is that the transport given to remote.WithTransport ends up being wrapped in a logging/retrying transport. I could just check for a *retryTransport type but that feels a little brittle. I could add a no-op interface that these transports implement to indicate that they're already "pinged" or authenticated, but then I'd have to make sure every wrapping transport implements it too. I remember this being a bit of a rat's nest with transport wrapping.

All in all, I think I'm OK with how this currently works; the ping actually verifies that the token is still valid, and if we get an unauthorized response a new token will be acquired, so this pretty elegantly handles the case of an expired token in a long-lived transport object. I'm guessing ping requests don't count towards rate limits, but this is a total blind guess as there doesn't seem to be a documented standard there.

In any case I think this issue can be closed, unless you'd like to leave it open to make this flow more obvious (e.g. documentation or refactoring the code to not assume a 200 response is 'anonymous').

1 it's not really "anonymous", it's more like "you pinged with sufficient auth" - which may mean anonymous if no auth was given

@jonjohnsonjr
Copy link
Collaborator

So, funny thing: this actually kind of already works without making any changes!

Fantastic.

I could just check for a *retryTransport type but that feels a little brittle.
...
I remember this being a bit of a rat's nest with transport wrapping.

Yeah this is unfortunate :/ we're dealing with 4 or 5 different wrappings, which was nice at the time for testing, but makes things a bit complicated.

remote.makeOptions(t) yields retryTransport{logTransport{t}}

transport.New(t) yields bearerTransport{useragentTransport{schemeTransport{t}}}

so we end up with a bearerTransport{useragentTransport{schemeTransport{retryTransport{logTransport{t}}}}}

By the time transport.New sees what you passed in, it's dealing with a:
retryTransport{logTransport{bearerTransport{useragentTransport{schemeTransport{retryTransport{logTransport{t}}}}}}}

I almost feel like I need the go 1.13 error API stuff to deal with this 😄

In any case I think this issue can be closed, unless you'd like to leave it open to make this flow more obvious (e.g. documentation or refactoring the code to not assume a 200 response is 'anonymous').

The only issue I really have here is with double-wrapped transports. I think we would end up logging things twice and retrying things twice, which isn't ideal. Let's leave this open for now while I try to think of a better way to deal with ^^^^ that.

1 it's not really "anonymous", it's more like "you pinged with sufficient auth" - which may mean anonymous if no auth was given

Yeah it's kinda funky with all the wrapping. It's anonymous in the outer layer because the inner layer handles all the auth.

@jonjohnsonjr
Copy link
Collaborator

Hmm... I'm definitely wrong. The retryTransport{logTransport{}} doesn't stack. We're just dealing with an extra ping request, whicih isn't as bad but still annoying.

We can avoid that if we do what containerd does where they just start making requests and only authenticate when they hit a 401. Related: #666 (comment)

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants