Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker registry allows to serve any blob user wants #9755

Closed
miminar opened this issue Jul 8, 2016 · 13 comments · Fixed by #9819
Closed

Docker registry allows to serve any blob user wants #9755

miminar opened this issue Jul 8, 2016 · 13 comments · Fixed by #9819

Comments

@miminar
Copy link

miminar commented Jul 8, 2016

Previously there was an authorization check comprising of existence check on a layer link inside the repository.

We're skipping layer links completely now. The Stat() and ServeBlob() calls reach directly to global registry blob store (storing blobs pushed to the registry). Because of that, any user, who is authorized to pull from at least one repository, can pull any blob he wants. Thanks to our blobDescriptorService().

The only restriction from pulling any blob a user wants is that he doesn't know what blobs are stored in the registry unless he is able to get somehow an image manifest or image object.

Version

Docker registry 2.4.0.

Steps To Reproduce
  1. push any image as a userA to image stream A/IS
  2. choose one of its layers (oc get -o yaml isimage <imagename>), let's call it layer L
  3. login as a userB
  4. create a new project P and a new image stream in there (oc create imagestream IS)
  5. stat the blob using simple curl:
$ curl -v -X HEAD -u "userB:$btoken" "$registry_url/v2/P/IS/blobs/L"
Current Result
* About to connect() to 172.30.241.183 port 5000 (#0)
*   Trying 172.30.241.183...
* Connected to 172.30.241.183 (172.30.241.183) port 5000 (#0)
* Server auth using Basic with user 'alice'
> HEAD /v2/target/busybox/blobs/sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f HTTP/1.1
> Authorization: Basic YWxpY2U6d1RyTFN6VnQxTjJvREl5ZENrVDEzR1pTekZsMnFYWVlYYmdDQmw4QzRPWQ==
> User-Agent: curl/7.29.0
> Host: 172.30.241.183:5000
> Accept: */*
> 
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: max-age=31536000
< Content-Length: 667590
< Content-Type: application/octet-stream
< Docker-Content-Digest: sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f
< Docker-Distribution-Api-Version: registry/2.0
< Etag: "sha256:8ddc19f16526912237dd8af81971d5e4dd0587907234be2b83e249518d5b673f"
< Date: Fri, 08 Jul 2016 15:28:43 GMT
Expected Result

Unauthorized

Additional info

Suggested priority: P1

@miminar
Copy link
Author

miminar commented Jul 8, 2016

/cc @liggitt

I started addressing this.

@miminar miminar self-assigned this Jul 8, 2016
@miminar miminar added this to the 1.3.0 milestone Jul 8, 2016
@smarterclayton
Copy link
Contributor

Security regressions are always P0.

@miminar
Copy link
Author

miminar commented Jul 11, 2016

Proposed solution.

Build a cache of blob digests in image stream:

type ImageStream struct {
    // ...
    struct Status {
        // ...

        // 1st cache
        // A cache for existing images
        // blob digest  -> blob size
        ImageBlobs map[string]int64

        // 2nd cache
        // A cache for images just being uploaded
        UploadedBlobs map[string]struct{
            Size int64
            Uploaded time.Time 
        }
    }
}

1st cache

is needed for proper authorization checks. We cannot use layer links stored in registry's storage for this purpose because they don't address a use-case of oc tag source targetrepository.

Without caching layer blobs, during a simple blobStore.Stat() we would need to fetch all images of particular image stream and search them for a presence of the blob. Such an operation happens at least twice during an image pull and at least 3 times during image push for a single blob.

This cache needs to be filled during migration. It needs to be updated during each addition/removal of an image or during image pruning.

The size information will give us a significant boost when fetching missing layer sizes (see #7706 and #9115).

It may be also useful for quota counting and usage display.

2nd cache

is needed for proper authorization check during a push of an image. When a manifest is being uploaded (as a last operation of a push), the registry verifies that all of its referenced blobs are present. This cache is a replacement for layer links that used to serve this purpose.

This cache will also prevent image pruner from removing blobs not referenced by any image.

Uploaded field will allow to determine aborted/unfinished image uploads by habing old-enough time-stamp. Such an entries can be pruned with corresponding blobs on registry's storage.

The cache will be updated during each blob upload, manifest PUT and image pruning.

Why a second cache? In comparison to the 1st, this will be a lot smaller in terms of items. All its entries need additional timestamp attribute though. Also, pruner needs to easily distinguish blob referenced by existing image from the one not yet referenced.

Cache the above inside the registry

To reduce round-trips to etcd, let's cache the information in the registry as well. Use digestToRepositoryCache for this purpose.

Modify stat-ing

On the lowest level, when the local blob is checked for presence inside our blobDescriptorService:

  1. do the existence check for a blob in global registry store (filesystem check)
  2. if not present, err out
  3. see if it's in digestToRepositoryCache prefixed with no registry url (local blob)
  4. if not present, query 1st cache in an image stream in etcd
  5. if not present, query 2nd cache
  6. err-out with known blob if not there
  7. return blob descriptor otherwise

@smarterclayton
Copy link
Contributor

Could we use layer links only for the second one?

On Mon, Jul 11, 2016 at 2:01 PM, Michal Minar notifications@github.com
wrote:

Proposed solution.
Build a cache of blob digests in image stream:

type ImageStream struct {
// ...
struct Status {
// ...

    // 1st cache
    // A cache for existing images
    // blob digest  -> blob size
    ImageBlobs map[string]int64

    // 2nd cache
    // A cache for images just being uploaded
    UploadedBlobs map[string]struct{
        Size int64
        Uploaded time.Time
    }
}

}

1st cache

is needed for proper authorization checks. We cannot use layer links
stored in registry's storage for this purpose because they don't address a
use-case of oc tag source targetrepository.

Without caching layer blobs, during a simple blobStore.Stat() we would
need to fetch all images of particular image stream and search them for a
presence of the blob. Such an operation happens at least twice during an
image pull and at least 3 times during image push for a single blob.

This cache needs to be filled during migration. It needs to be updated
during each addition/removal of an image or during image pruning.

The size information will give us a significant boost when fetching
missing layer sizes (see #7706
#7706 and #9115
#9115).

It may be also useful for quota counting and usage display.
2nd cache

is needed for proper authorization check during a push of an image. When a
manifest is being uploaded (as a last operation of a push), the registry
verifies that all of its referenced blobs are present. This cache is a
replacement for layer links that used to serve this purpose.

This cache will also prevent image pruner from removing blobs not
referenced by any image.

Uploaded field will allow to determine aborted/unfinished image uploads
by habing old-enough time-stamp. Such an entries can be pruned with
corresponding blobs on registry's storage.

The cache will be updated during each blob upload, manifest PUT and image
pruning.

Why a second cache? In comparison to the 1st, this will be a lot
smaller in terms of items. All its entries need additional timestamp
attribute though. Also, pruner needs to easily distinguish blob referenced
by existing image from the one not yet referenced.
Cache the above inside the registry

To reduce round-trips to etcd, let's cache the information in the registry
as well. Use digestToRepositoryCache
https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/digestcache.go#L14
for this purpose.
Modify stat-ing

On the lowest level, when the local blob is checked for presence inside
our blobDescriptorService
https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/blobdescriptorservice.go#L24
:

  1. do the existence check for a blob in global registry store
    (filesystem check)
  2. if not present, err out
  3. see if it's in digestToRepositoryCache prefixed with no registry
    url (local blob)
  4. if not present, query 1st cache in an image stream in etcd
  5. if not present, query 2nd cache
  6. err-out with known blob if not there
  7. return blob descriptor otherwise


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#9755 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p-wmwpOqFQ2DV1pC-jh6TkD6vL2Bks5qUoUEgaJpZM4JIJYK
.

@smarterclayton
Copy link
Contributor

So the alternative here for disclosure is to create an LRU cache of layer
ID to array of possible registries in the registry and then use that to do
positive caching. If it fails, we'd have to do the lookup against etcd
anyway (we don't want the cache to contain out of date negative entities).
That's a significant source of load from the api server every cache failure
(1 image stream + N image lookups).

On Mon, Jul 11, 2016 at 2:17 PM, Clayton Coleman ccoleman@redhat.com
wrote:

Could we use layer links only for the second one?

On Mon, Jul 11, 2016 at 2:01 PM, Michal Minar notifications@github.com
wrote:

Proposed solution.
Build a cache of blob digests in image stream:

type ImageStream struct {
// ...
struct Status {
// ...

    // 1st cache
    // A cache for existing images
    // blob digest  -> blob size
    ImageBlobs map[string]int64

    // 2nd cache
    // A cache for images just being uploaded
    UploadedBlobs map[string]struct{
        Size int64
        Uploaded time.Time
    }
}

}

1st cache

is needed for proper authorization checks. We cannot use layer links
stored in registry's storage for this purpose because they don't address a
use-case of oc tag source targetrepository.

Without caching layer blobs, during a simple blobStore.Stat() we would
need to fetch all images of particular image stream and search them for a
presence of the blob. Such an operation happens at least twice during an
image pull and at least 3 times during image push for a single blob.

This cache needs to be filled during migration. It needs to be updated
during each addition/removal of an image or during image pruning.

The size information will give us a significant boost when fetching
missing layer sizes (see #7706
#7706 and #9115
#9115).

It may be also useful for quota counting and usage display.
2nd cache

is needed for proper authorization check during a push of an image. When
a manifest is being uploaded (as a last operation of a push), the registry
verifies that all of its referenced blobs are present. This cache is a
replacement for layer links that used to serve this purpose.

This cache will also prevent image pruner from removing blobs not
referenced by any image.

Uploaded field will allow to determine aborted/unfinished image uploads
by habing old-enough time-stamp. Such an entries can be pruned with
corresponding blobs on registry's storage.

The cache will be updated during each blob upload, manifest PUT and image
pruning.

Why a second cache? In comparison to the 1st, this will be a lot
smaller in terms of items. All its entries need additional timestamp
attribute though. Also, pruner needs to easily distinguish blob referenced
by existing image from the one not yet referenced.
Cache the above inside the registry

To reduce round-trips to etcd, let's cache the information in the
registry as well. Use digestToRepositoryCache
https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/digestcache.go#L14
for this purpose.
Modify stat-ing

On the lowest level, when the local blob is checked for presence inside
our blobDescriptorService
https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/blobdescriptorservice.go#L24
:

  1. do the existence check for a blob in global registry store
    (filesystem check)
  2. if not present, err out
  3. see if it's in digestToRepositoryCache prefixed with no registry
    url (local blob)
  4. if not present, query 1st cache in an image stream in etcd
  5. if not present, query 2nd cache
  6. err-out with known blob if not there
  7. return blob descriptor otherwise


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#9755 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_p-wmwpOqFQ2DV1pC-jh6TkD6vL2Bks5qUoUEgaJpZM4JIJYK
.

@legionus
Copy link
Contributor

@miminar We can do the last step and start to store in etcd information about which blobs correspond to which repository. This is the latest information which is not stored in etcd. Now it can be obtained using storage driver. If we put this information in the etcd, it will also help reduce the number of calls to storage driver.

@miminar
Copy link
Author

miminar commented Jul 12, 2016

Could we use layer links only for the second one?

We could for most of regular cases. The problem case is a push with a cross-repo mount from remote repository where we intercept blobStore.Create(ctx, options) and handle the mount option specially:

  1. if the remote blob exists, we return immediately with distribution.ErrBlobMounted
  2. otherwise we, pass down the request to upstream's blobStore

Unfortunately we cannot make the link ourselves because the link operation is private. We could try negatiate with upstream though.

@smarterclayton I agree that using local layer links is simpler and more efficient. Shall we go that route with a possible risk of carrying linkBlob expose patch? Update: Considering a rework of image pruning, I'd highly recommend tagging just uploaded blobs in etcd in target imagestream.

We can do the last step and start to store in etcd information about which blobs correspond to which repository. This is the latest information which is not stored in etcd. Now it can be obtained using storage driver. If we put this information in the etcd, it will also help reduce the number of calls to storage driver.

This would have to be a top-level resource. So it's a third variant. It addresses all the authorization use cases. Morover it would allow us to easily fetch layer blob sizes when missing in v2->v1 converted manifest if they already exist in any imagestream. Compared to the caches in imagestreams, the downside of this is a loss of imagestream size which would be nice for display and for potential limiting in the future.

@miminar
Copy link
Author

miminar commented Jul 12, 2016

What I'm worried about now is keeping the cache up to date. Imagine there's a image stream A with an image I under tag T. There's another image stream B having ImageStreamTag pointing to A:T. Now some user builds a new image and pushes it to A:T. Ideally, this operation would result in an update of both A's and B's ImageBlobs caches. The question is, how the registry finds out, it should update B's ImageBlobs.

There are many other cases like this. E.g. replace ImageStreamTag with an ImageStreamImage - this would result no only in additions to B but to removals as well. Images may also be tagged, and untagged. Each such change needs to be reflected on all the places.

To address it, can we also extend TagEventList and TagEvent types for a reverse mapping cache to all of the image stream tags referencing the particular tag or image?

type ImageStream struct {
    // ...
    struct Status {
        // ...
        Tags map[string]TagEventList
    }
}

type TagEventList struct {                                                                                      
    // ...
    Items []TagEvent                                                                                            
    // an array of <image stream>:<tag> entries referencing this tag
    References []string
}                                                                                                               


type TagEvent struct {
    // ...
    // an array of <image stream>:<tag> entries referencing this image
    References []string
}

Each image push, oc tag, oc delete istag would check these entries and
updated references ISTags and their ImageBlobs accordingly.

WDYT?

@miminar
Copy link
Author

miminar commented Jul 12, 2016

Could we use layer links only for the second one?

I agree that using local layer links is simpler and more efficient. Shall we go that route with a possible risk of carrying linkBlob expose patch?

I changed my mind. From a perspective of image pruner, an orphaned blob, as a result of incomplete push, is imposible to distinguish from a blob just uploaded and waiting for a manifest PUT to be referenced for the first time.

Upstream requires a garbage collector to be run on a registry instance being in a read-only mode because of this problem (deleting blobs while uploads happening at the same time). We plan to run periodic image prunes inside of registry without a possibility of switching to read-only mode.

I'd highly recommend marking just uploaded blobs in target imagestream in etcd instead of relying on repository layer links.

@smarterclayton
Copy link
Contributor

smarterclayton commented Jul 12, 2016 via email

@smarterclayton
Copy link
Contributor

On Jul 12, 2016, at 1:26 PM, Michal Minar notifications@github.com wrote:

Could we use layer links only for the second one?

I agree that using local layer links is simpler and more efficient. Shall
we go that route with a possible risk of carrying linkBlob expose patch?

I changed my mind. From a perspective of image pruner, an orphaned blob, as
a result of incomplete push, is imposible to distinguish from a blob just
uploaded and waiting for a manifest PUT to be referenced for the first
time.

Upstream requires a garbage collector to be run on a registry instance
being in a read-only mode because of this problem (deleting blobs while
uploads happening at the same time). We plan to run periodic image prunes
inside of registry without a possibility of switching to read-only mode.

Don't links have timestamps? Pruning and GC already requires a window of
safety based on time, so I don't see how that is being tracked for uploads.

I'd highly recommend marking just uploaded blobs in target imagestream in
etcd instead of relying on repository layer links.

@miminar
Copy link
Author

miminar commented Jul 12, 2016

Don't links have timestamps? Pruning and GC already requires a window of safety based on time, so I don't see how that is being tracked for uploads.

They have! And actually, it's possible to stat them now that we have storage middleware. Thanks for the suggestion. Problem of pruning uploaded blobs solved with layer links.

@miminar
Copy link
Author

miminar commented Jul 12, 2016

It may be easier for right now to pull together the prototype lookup cache in the registry and see how complex that is while we discuss this issue further.

OK, I'll start with this prototype which will fetch the images until blob is found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants