-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User can get only blobs he's able to see #9819
Changes from all commits
7af1a2c
9d79e57
918739b
8992d73
ba185fe
4ac83cb
ddf1151
3f1c7a9
f4f98fa
db956d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,28 @@ | ||
package server | ||
|
||
import ( | ||
"fmt" | ||
"sort" | ||
"time" | ||
|
||
"github.com/docker/distribution" | ||
"github.com/docker/distribution/context" | ||
"github.com/docker/distribution/digest" | ||
"github.com/docker/distribution/registry/middleware/registry" | ||
"github.com/docker/distribution/registry/storage" | ||
|
||
kerrors "k8s.io/kubernetes/pkg/api/errors" | ||
|
||
imageapi "github.com/openshift/origin/pkg/image/api" | ||
) | ||
|
||
// ByGeneration allows for sorting tag events from latest to oldest. | ||
type ByGeneration []*imageapi.TagEvent | ||
|
||
func (b ByGeneration) Less(i, j int) bool { return b[i].Generation > b[j].Generation } | ||
func (b ByGeneration) Len() int { return len(b) } | ||
func (b ByGeneration) Swap(i, j int) { b[i], b[j] = b[j], b[i] } | ||
|
||
func init() { | ||
middleware.RegisterOptions(storage.BlobDescriptorServiceFactory(&blobDescriptorServiceFactory{})) | ||
} | ||
|
@@ -25,6 +40,168 @@ type blobDescriptorService struct { | |
distribution.BlobDescriptorService | ||
} | ||
|
||
// Stat returns a a blob descriptor if the given blob is either linked in repository or is referenced in | ||
// corresponding image stream. This method is invoked from inside of upstream's linkedBlobStore. It expects | ||
// a proper repository object to be set on given context by upper openshift middleware wrappers. | ||
func (bs *blobDescriptorService) Stat(ctx context.Context, dgst digest.Digest) (distribution.Descriptor, error) { | ||
return dockerRegistry.BlobStatter().Stat(ctx, dgst) | ||
repo, found := RepositoryFrom(ctx) | ||
if !found || repo == nil { | ||
err := fmt.Errorf("failed to retrieve repository from context") | ||
context.GetLogger(ctx).Error(err) | ||
return distribution.Descriptor{}, err | ||
} | ||
|
||
// if there is a repo layer link, return its descriptor | ||
desc, err := bs.BlobDescriptorService.Stat(ctx, dgst) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a cache for this already in the registry? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ahh, I completely forgot. There is. It's a wrapper just above this wrapper. In our default config it's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may need to reorder it rather than turning it off - my worry is
that lower code depends on it.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not really. The order of wrappers and stores is:
The inMemoryCache will be updated with any successful stat we return with our backend and the result will be kept forever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the only place in memory cache was being used was here? No one else On Sat, Jul 16, 2016 at 11:09 AM, Michal Minar notifications@github.com
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I made a bad mistake. The inmemory cache is used also for stat-ing the global blob store. If turned off, all the stats will take much longer when done against slow storage. And I was wrong also in the case of the order. Our That can be easily done as a follow-up. For now, I'll just re-enable the inmemory cache. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we just assuming our There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. That's how the upstream code currently looks like. We don't have a warranty it will stay this way. What we can do is add a unit tests invoking various stat/get/create methods and ensuring this or some mock blobDescriptorService gets called. Would that be a sufficient guaranty? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I added a unit test verifying that the |
||
if err == nil { | ||
// and remember the association | ||
repo.cachedLayers.RememberDigest(dgst, repo.blobrepositorycachettl, imageapi.DockerImageReference{ | ||
Namespace: repo.namespace, | ||
Name: repo.name, | ||
}.Exact()) | ||
return desc, nil | ||
} | ||
|
||
context.GetLogger(ctx).Debugf("could not stat layer link %q in repository %q: %v", dgst.String(), repo.Named().Name(), err) | ||
|
||
// verify the blob is stored locally | ||
desc, err = dockerRegistry.BlobStatter().Stat(ctx, dgst) | ||
if err != nil { | ||
return desc, err | ||
} | ||
|
||
// ensure it's referenced inside of corresponding image stream | ||
if imageStreamHasBlob(repo, dgst) { | ||
return desc, nil | ||
} | ||
|
||
return distribution.Descriptor{}, distribution.ErrBlobUnknown | ||
} | ||
|
||
func (bs *blobDescriptorService) Clear(ctx context.Context, dgst digest.Digest) error { | ||
repo, found := RepositoryFrom(ctx) | ||
if !found || repo == nil { | ||
err := fmt.Errorf("failed to retrieve repository from context") | ||
context.GetLogger(ctx).Error(err) | ||
return err | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this operation accessible via the API? do we have to guard access to this? what happens if you clear a given digest that is not actually referenced from a repo or in an image stream? does that let me delete a blob I shouldn't have access to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It is:
We already do in the auth controller https://github.com/miminar/origin/blob/pullthrough-no-copy-on-head-8613/pkg/dockerregistry/server/auth.go#L370. User needs to be able to
This is a wrapper only for linkedBlobStore, which allows to delete only links to blobs. So each invocation may only remove reference to the blob from single repository (aka image stream). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
repo.cachedLayers.ForgetDigest(dgst, imageapi.DockerImageReference{ | ||
Namespace: repo.namespace, | ||
Name: repo.name, | ||
}.Exact()) | ||
return bs.BlobDescriptorService.Clear(ctx, dgst) | ||
} | ||
|
||
// imageStreamHasBlob returns true if the given blob digest is referenced in image stream corresponding to | ||
// given repository. If not found locally, image stream's images will be iterated and fetched from newest to | ||
// oldest until found. Each processed image will update local cache of blobs. | ||
func imageStreamHasBlob(r *repository, dgst digest.Digest) bool { | ||
repoCacheName := imageapi.DockerImageReference{Namespace: r.namespace, Name: r.name}.Exact() | ||
if r.cachedLayers.RepositoryHasBlob(repoCacheName, dgst) { | ||
context.GetLogger(r.ctx).Debugf("found cached blob %q in repository %s", dgst.String(), r.Named().Name()) | ||
return true | ||
} | ||
|
||
context.GetLogger(r.ctx).Debugf("verifying presence of blob %q in image stream %s/%s", dgst.String(), r.namespace, r.name) | ||
started := time.Now() | ||
logFound := func(found bool) bool { | ||
elapsed := time.Now().Sub(started) | ||
if found { | ||
context.GetLogger(r.ctx).Debugf("verified presence of blob %q in image stream %s/%s after %s", dgst.String(), r.namespace, r.name, elapsed.String()) | ||
} else { | ||
context.GetLogger(r.ctx).Debugf("detected absence of blob %q in image stream %s/%s after %s", dgst.String(), r.namespace, r.name, elapsed.String()) | ||
} | ||
return found | ||
} | ||
|
||
// verify directly with etcd | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this the point at which we expect things to get expensive? do we want to add tracing here so we know how frequently we're hitting this path, and how long it's taking us? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That will be interesting data. I'l add that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. debug logging added |
||
is, err := r.getImageStream() | ||
if err != nil { | ||
context.GetLogger(r.ctx).Errorf("failed to get image stream: %v", err) | ||
return logFound(false) | ||
} | ||
|
||
tagEvents := []*imageapi.TagEvent{} | ||
event2Name := make(map[*imageapi.TagEvent]string) | ||
for name, eventList := range is.Status.Tags { | ||
for i := range eventList.Items { | ||
event := &eventList.Items[i] | ||
tagEvents = append(tagEvents, event) | ||
event2Name[event] = name | ||
} | ||
} | ||
// search from youngest to oldest | ||
sort.Sort(ByGeneration(tagEvents)) | ||
|
||
processedImages := map[string]struct{}{} | ||
|
||
for _, tagEvent := range tagEvents { | ||
if _, processed := processedImages[tagEvent.Image]; processed { | ||
continue | ||
} | ||
if imageHasBlob(r, repoCacheName, tagEvent.Image, dgst.String(), !r.pullthrough) { | ||
tagName := event2Name[tagEvent] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the big question is the access pattern - you do the cache on all candidate images (which fills the digest cache pretty rapidly), but we might only want to cache the image that we actually found's layers. Is there a case where a user is going to pull two images from the same image stream in close succession? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, you added There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good catch. It seemed to me like a waste to throw away the fetched data. But using this approach we may push out frequently accessed items if the cache is close to full.
I planned to use it but it proved as not useful in the end. I'll remove it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Caching just the layers of matching image now. The |
||
context.GetLogger(r.ctx).Debugf("blob found under istag %s/%s:%s in image %s", r.namespace, r.name, tagName, tagEvent.Image) | ||
return logFound(true) | ||
} | ||
processedImages[tagEvent.Image] = struct{}{} | ||
} | ||
|
||
context.GetLogger(r.ctx).Warnf("blob %q exists locally but is not referenced in repository %s/%s", dgst.String(), r.namespace, r.name) | ||
|
||
return logFound(false) | ||
} | ||
|
||
// imageHasBlob returns true if the image identified by imageName refers to the given blob. The image is | ||
// fetched. If requireManaged is true and the image is not managed (it refers to remote registry), the image | ||
// will not be processed. Fetched image will update local cache of blobs -> repositories with (blobDigest, | ||
// cacheName) pairs. | ||
func imageHasBlob( | ||
r *repository, | ||
cacheName, | ||
imageName, | ||
blobDigest string, | ||
requireManaged bool, | ||
) bool { | ||
context.GetLogger(r.ctx).Debugf("getting image %s", imageName) | ||
image, err := r.getImage(digest.Digest(imageName)) | ||
if err != nil { | ||
if kerrors.IsNotFound(err) { | ||
context.GetLogger(r.ctx).Debugf("image %q not found: imageName") | ||
} else { | ||
context.GetLogger(r.ctx).Errorf("failed to get image: %v", err) | ||
} | ||
return false | ||
} | ||
|
||
// in case of pullthrough disabled, client won't be able to download a blob belonging to not managed image | ||
// (image stored in external registry), thus don't consider them as candidates | ||
if managed := image.Annotations[imageapi.ManagedByOpenShiftAnnotation]; requireManaged && managed != "true" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you've tagged in an image successfully, you've proved you have access to it. All this case guards against is pullthrough (for now). Can you add a comment describing why this is needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comment added. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this set to "true" for all non-pullthrough images, or is it missing from the oldest images? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. answering my own question, looks like we started setting that in c0fc084#diff-1a18cb8c0d7684cefcef809b9d01aac3, well prior to 1.0 |
||
context.GetLogger(r.ctx).Debugf("skipping not managed image") | ||
return false | ||
} | ||
|
||
if len(image.DockerImageLayers) == 0 { | ||
if len(image.DockerImageManifestMediaType) > 0 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. comment explaining why this field makes us short-circuit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure thing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a line below. |
||
// If the media type is set, we can safely assume that the best effort to fill the image layers | ||
// has already been done. There are none. | ||
return false | ||
} | ||
err = imageapi.ImageWithMetadata(image) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should fix the godoc on ImageWithMetadata to indicate it mutates the passed image There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
godoc'd |
||
if err != nil { | ||
context.GetLogger(r.ctx).Errorf("failed to get metadata for image %s: %v", imageName, err) | ||
return false | ||
} | ||
} | ||
|
||
for _, layer := range image.DockerImageLayers { | ||
if layer.Name == blobDigest { | ||
// remember all the layers of matching image | ||
r.rememberLayersOfImage(image, cacheName) | ||
return true | ||
} | ||
} | ||
|
||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "This method is invoked from inside of upstream's linkedBlobStore" still accurate with the modified upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. It just doesn't apply to a cross-repo mount stat in the source repository anymore. All the other stat calls go through this service.