-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker registry allows to serve any blob user wants #9755
Comments
/cc @liggitt I started addressing this. |
Security regressions are always P0. |
Proposed solution. Build a cache of blob digests in image stream:type ImageStream struct {
// ...
struct Status {
// ...
// 1st cache
// A cache for existing images
// blob digest -> blob size
ImageBlobs map[string]int64
// 2nd cache
// A cache for images just being uploaded
UploadedBlobs map[string]struct{
Size int64
Uploaded time.Time
}
}
} 1st cacheis needed for proper authorization checks. We cannot use layer links stored in registry's storage for this purpose because they don't address a use-case of Without caching layer blobs, during a simple This cache needs to be filled during migration. It needs to be updated during each addition/removal of an image or during image pruning. The size information will give us a significant boost when fetching missing layer sizes (see #7706 and #9115). It may be also useful for quota counting and usage display. 2nd cacheis needed for proper authorization check during a push of an image. When a manifest is being uploaded (as a last operation of a push), the registry verifies that all of its referenced blobs are present. This cache is a replacement for layer links that used to serve this purpose. This cache will also prevent image pruner from removing blobs not referenced by any image.
The cache will be updated during each blob upload, manifest PUT and image pruning. Why a second cache? In comparison to the 1st, this will be a lot smaller in terms of items. All its entries need additional timestamp attribute though. Also, pruner needs to easily distinguish blob referenced by existing image from the one not yet referenced. Cache the above inside the registryTo reduce round-trips to etcd, let's cache the information in the registry as well. Use Modify stat-ingOn the lowest level, when the local blob is checked for presence inside our blobDescriptorService:
|
Could we use layer links only for the second one? On Mon, Jul 11, 2016 at 2:01 PM, Michal Minar notifications@github.com
|
So the alternative here for disclosure is to create an LRU cache of layer On Mon, Jul 11, 2016 at 2:17 PM, Clayton Coleman ccoleman@redhat.com
|
@miminar We can do the last step and start to store in etcd information about which blobs correspond to which repository. This is the latest information which is not stored in etcd. Now it can be obtained using storage driver. If we put this information in the etcd, it will also help reduce the number of calls to storage driver. |
We could for most of regular cases. The problem case is a push with a cross-repo mount from remote repository where we intercept
Unfortunately we cannot make the link ourselves because the link operation is private. We could try negatiate with upstream though. @smarterclayton I agree that using local layer links is simpler and more efficient.
This would have to be a top-level resource. So it's a third variant. It addresses all the authorization use cases. Morover it would allow us to easily fetch layer blob sizes when missing in v2->v1 converted manifest if they already exist in any imagestream. Compared to the caches in imagestreams, the downside of this is a loss of imagestream size which would be nice for display and for potential limiting in the future. |
What I'm worried about now is keeping the cache up to date. Imagine there's a image stream A with an image I under tag T. There's another image stream B having There are many other cases like this. E.g. replace To address it, can we also extend type ImageStream struct {
// ...
struct Status {
// ...
Tags map[string]TagEventList
}
}
type TagEventList struct {
// ...
Items []TagEvent
// an array of <image stream>:<tag> entries referencing this tag
References []string
}
type TagEvent struct {
// ...
// an array of <image stream>:<tag> entries referencing this image
References []string
} Each image push, WDYT? |
I changed my mind. From a perspective of image pruner, an orphaned blob, as a result of incomplete push, is imposible to distinguish from a blob just uploaded and waiting for a manifest PUT to be referenced for the first time. Upstream requires a garbage collector to be run on a registry instance being in a read-only mode because of this problem (deleting blobs while uploads happening at the same time). We plan to run periodic image prunes inside of registry without a possibility of switching to read-only mode. I'd highly recommend marking just uploaded blobs in target imagestream in etcd instead of relying on repository layer links. |
I don't think that's practical. For one thing, it means the
denormalization has to be maintained transactionally which we can't
guarantee.
It's reasonable that adding and removing a tagged image from an image
stream requires the image to be loaded first, although that could lead
to wedging (admin deletes image, can't be removed).
It may be easier for right now to pull together the prototype lookup
cache in the registry and see how complex that is while we discuss
this issue further.
|
On Jul 12, 2016, at 1:26 PM, Michal Minar notifications@github.com wrote: Could we use layer links only for the second one? I agree that using local layer links is simpler and more efficient. Shall I changed my mind. From a perspective of image pruner, an orphaned blob, as Upstream requires a garbage collector to be run on a registry instance Don't links have timestamps? Pruning and GC already requires a window of I'd highly recommend marking just uploaded blobs in target imagestream in |
They have! And actually, it's possible to stat them now that we have storage middleware. Thanks for the suggestion. Problem of pruning uploaded blobs solved with layer links. |
OK, I'll start with this prototype which will fetch the images until blob is found. |
Previously there was an authorization check comprising of existence check on a layer link inside the repository.
We're skipping layer links completely now. The
Stat()
andServeBlob()
calls reach directly to global registry blob store (storing blobs pushed to the registry). Because of that, any user, who is authorized to pull from at least one repository, can pull any blob he wants. Thanks to our blobDescriptorService().The only restriction from pulling any blob a user wants is that he doesn't know what blobs are stored in the registry unless he is able to get somehow an image manifest or image object.
Version
Docker registry 2.4.0.
Steps To Reproduce
oc get -o yaml isimage <imagename>
), let's call it layer Loc create imagestream IS
)Current Result
Expected Result
Unauthorized
Additional info
Suggested priority: P1
The text was updated successfully, but these errors were encountered: