Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFP] replace catalog API functionality #22

Closed
vbatts opened this issue Aug 29, 2018 · 47 comments
Closed

[RFP] replace catalog API functionality #22

vbatts opened this issue Aug 29, 2018 · 47 comments

Comments

@vbatts
Copy link
Member

vbatts commented Aug 29, 2018

for more rich indexing and searching of container images in a registry.
There is the /v2/_catalog though it still seems not clear enough for implementers.

@jonjohnsonjr
Copy link
Contributor

I'd like to remove /v2/_catalog from the spec entirely. It feels like an implementation detail that leaked into the docker spec, and it subverts the registry namespacing by being a global endpoint. My understanding is that it was an "admin API", which doesn't seem like it should be in scope of the distribution API, but a user-facing version of catalog would be great. 👍

There are a couple relevant PRs in docker/distribution of the same vein that would be nice to include as well:

From the proposal, it's explicitly out of scope:

Managing the grouping of image repository names is considered part of distribution policy or content management, which are out of scope. For example, “which image repositories are under library/?” is out of scope for this project.

... but it would be very nice to have, eventually :)

In general, I'd like to get the spec into a minimal, workable state before we start adding any features.

@jonjohnsonjr
Copy link
Contributor

With the caveat that I think all of this is out of scope for this project, I'd like to brain dump my thoughts on it so that we can maybe reach some consensus or have a plan for a proposal. Possibly, all of this could be a completely separate spec/service that many registry operators just happen to host side-by-side with their distribution-spec compliant registry. That said...

Most[citation needed] registries don't implement /v2/_catalog, so it definitely shouldn't be a requirement. While I'd personally like to see this removed from the spec, there are some existing clients that consume this endpoint (I know of only spinnaker, but there are likely others).

At the very least, we should mark this as OPTIONAL. (The rest of the spec would benefit from more formal Requirements Level language, too.)

Regardless of what we do here, it would be nice to have some method of indexing a registry that fits the spec's namespacing model. Being able to index the registry enables some nice projects, e.g. flagstate, grafeas.

I haven't put together a formal proposal for anything yet, but some prior art to get the ball rolling:

Listing Repostories

This + /tags/list/ would enable clients to index at least the tagged images in a registry.

  • /v2/_catalog kind of satisfies this (as much as I hate it).
  • Harbor has a /repositories API to do this.
  • GCR has an undocumented, hierarchical view of "child" repositories hacked onto its /tags/list/ endpoint.
  • TODO: Your favorite registry's homegrown solution here.

Listing Images

There's currently a /tags/list endpoint for listing tags, but no way to list just manifests; thus no way to discover untagged images.

This + Listing Repositories + /tags/list would enable a client to index the entirety of a registry.

  • docker/distribution proposal to return add an endpoint that returns manifest descriptors.
  • docker/distribution proposal to add search functionality to the registry.
  • GCR has an undocumented, list of "manifest" entries hacked onto its /tags/list/ endpoint.
  • TODO: Your favorite registry's homegrown solution here.

Pubsub

Given a point-in-time view of a registry, it's much more efficient to subscribe to a firehose of events than to constantly poll for changes. Many registries provide this feature. Unfortunately, none of these message formats seems compatible with each other. In an ideal world, we could standardize on some common format for registry events.

  • docker/distribution has webhooks.
  • GCR has pubsub.
  • ACR has webhooks.
  • TODO: Your favorite registry's homegrown solution here.

/cc @vbatts @dmcgowan

@dmcgowan
Copy link
Member

I am +1 to not including the _catalog API at all. If we are going to have an OPTIONAL api, maybe we can have something that is more manageable, such as _list at any level, including /_list (lists first level namespaces if supported), /<repo>/tags/_list, <repo>/manifests/_list, and /<firstnamespacepart>/_list (lists repositories if supported). This is similar to the proposal you linked and we have more flexibility if we are not tied to _catalog.

@jzelinskie
Copy link
Member

My collection of thoughts:

  • Quay implements the catalog endpoint, but does have some undocumented behavior. For example, logged in users do not see public repositories, only their own repositories.
  • Quay has an API for listing tags and listing repositories.
  • In terms of pubsub, Quay has a notification system that can send webhooks (and many other formats) on upload (and many other events).
  • The only popular registry that I can think of that doesn't implement the catalog endpoint is the OpenShift registry, which is being superseded by Quay.
  • I think calling it OPTIONAL is okay.
  • I'd rather see the existing v2-2 API further reviewed and distilled before adding any new APIs.

@samuelkarp
Copy link
Member

Here's what we do for Amazon ECR:

bsatlas added a commit to bsatlas/distribution-spec that referenced this issue Jan 4, 2019
This commit redefines the `_catalog` endpoint as an optional operation.

Background on the issue:
opencontainers#22
https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/rJ72OtZuhbc
opencontainers/tob#35
opencontainers/tob#46
opencontainers/tob#50

Signed-off-by: Atlas Kerr <atlaskerr@gmail.com>
bsatlas added a commit to bsatlas/distribution-spec that referenced this issue Jan 4, 2019
This commit redefines the `_catalog` endpoint as an optional operation.

Background on the issue:
opencontainers#22
https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/rJ72OtZuhbc
opencontainers/tob#35
opencontainers/tob#46
opencontainers/tob#50

Signed-off-by: Atlas Kerr <atlaskerr@gmail.com>
@jchesterpivotal

This comment has been minimized.

@bsatlas
Copy link
Contributor

bsatlas commented Jan 16, 2019

If we are okay with keeping catalog as an optional endpoint, I think issue can be closed.

@stevvooe
Copy link
Contributor

I vote for dropping it.

@bsatlas
Copy link
Contributor

bsatlas commented Jan 17, 2019

I just submitted a PR to remove the catalog completely. Hopefully that helps move the decision along a bit.

#45

@bsatlas
Copy link
Contributor

bsatlas commented Jan 22, 2019

I closed my PR since no one voted for dropping. This PR can be closed.

@jzelinskie
Copy link
Member

I'm not sure why you say that no one voted for dropping it? It looks like a lot of people in this thread agree it should be dropped.

@mikebrow
Copy link
Member

I'm fine with dropping the catalog api, so long as there is agreement on replacing it with a more useful list api, such as that which was discussed above.

@bsatlas
Copy link
Contributor

bsatlas commented Jan 25, 2019

@jzelinskie Sorry, I meant no one replied commented/LGTM/rejected my PR so I closed it. I figured I'd leave it up to you guys to make the change when yall were finished discussing.

@bsatlas
Copy link
Contributor

bsatlas commented Jan 25, 2019

@jzelinskie Sorry it sounds kind of rude. I just mean that I'm slowing down my contributions to this project and OCI in general because I feel like I'm being kinda annoying making PRs that no one told me to make and asking questions yall have already discussed years ago. I don't want to be "that guy" lol. I'll reopen the PR if there is still interest in fully dropping catalog though.

@mikebrow
Copy link
Member

Folks get busy :) Thanks for the commits!

@vbatts
Copy link
Member Author

vbatts commented Jan 26, 2019

@atlaskerr oh no Atlas! I think there is a disconnect. While there is history, your participation is good and valid. Sometimes old assumptions need to be challenged. I'm very glad for your PRs and commentary

@bsatlas
Copy link
Contributor

bsatlas commented Jan 27, 2019

Thanks guys. I guess I'm overreacting. The milestone for rc-1 is Feb 1 and there is so much housekeeping I wanted to get done before then and my anxiety is through the roof haha. I'll keep motivated!

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Mar 7, 2019

I'll throw out two suggestions and people can pick apart why they hate it or love it. These are small, additive changes that shouldn't be too hard to get registries to adopt, and they fit the existing API model pretty well. If this is interesting to anyone, we could iterate on it in a more collaborative medium.

1. /v2/.../repositories/list

This would mirror /v2/.../tags/list

For listing repositories under library, a client might send this request:

GET https://registry-1.docker.io/v2/library/repositories/list?n=100

Receiving this response:

200 OK
Content-Type: application/json
Link: https://registry-1.docker.io/v2/library/repositories/list?n=100&last=postfixadmin; rel="next"
{
  "name": "library",
  "repositories": [
    "adminer",
    "aerospike",
    "alpine",
    "alt",
    "amazoncorretto",
    "amazonlinux",
    "arangodb",
    "backdrop",
    "bash",
    "bonita",
    "buildpack-deps",
    "busybox",
    "cassandra",
    "centos",
    "chronograf",
    "cirros",
    "clearlinux",
    "clefos",
    "clojure",
    "composer",
    "consul",
    "convertigo",
    "couchbase",
    "couchdb",
    "crate",
    "crux",
    "debian",
    "docker",
    "drupal",
    "eclipse-mosquitto",
    "eggdrop",
    "elasticsearch",
    "elixir",
    "erlang",
    "euleros",
    "express-gateway",
    "fedora",
    "flink",
    "fsharp",
    "gazebo",
    "gcc",
    "geonetwork",
    "ghost",
    "golang",
    "gradle",
    "groovy",
    "haproxy",
    "haskell",
    "haxe",
    "hello-seattle",
    "hello-world",
    "hola-mundo",
    "httpd",
    "hylang",
    "ibmjava",
    "influxdb",
    "irssi",
    "jetty",
    "joomla",
    "jruby",
    "julia",
    "kaazing-gateway",
    "kapacitor",
    "kibana",
    "known",
    "kong",
    "lightstreamer",
    "logstash",
    "mageia",
    "mariadb",
    "matomo",
    "maven",
    "mediawiki",
    "memcached",
    "mongo",
    "mongo-express",
    "mono",
    "mysql",
    "nats",
    "nats-streaming",
    "neo4j",
    "neurodebian",
    "nextcloud",
    "nginx",
    "node",
    "notary",
    "nuxeo",
    "odoo",
    "openjdk",
    "open-liberty",
    "opensuse",
    "oraclelinux",
    "orientdb",
    "percona",
    "perl",
    "photon",
    "php",
    "php-zendserver",
    "plone",
    "postfixadmin"
  ]
}

Based on that response, the client would follow up with:

GET https://registry-1.docker.io/v2/library/repositories/list?n=100&last=postfixadmin
200 OK
Content-Type: application/json
{
  "name": "library",
  "repositories": [
    "postgres",
    "pypy",
    "python",
    "rabbitmq",
    "rakudo-star",
    "rapidoid",
    "r-base",
    "redis",
    "redmine",
    "registry",
    "rethinkdb",
    "rocket.chat",
    "ros",
    "ruby",
    "rust",
    "sentry",
    "silverpeas",
    "sl",
    "solr",
    "sonarqube",
    "sourcemage",
    "spiped",
    "storm",
    "swarm",
    "swift",
    "swipl",
    "teamspeak",
    "telegraf",
    "thrift",
    "tomcat",
    "tomee",
    "traefik",
    "ubuntu",
    "vault",
    "websphere-liberty",
    "wordpress",
    "xwiki",
    "yourls",
    "znc",
    "zookeeper"
  ]
}

I believe this is a good replacement for /v2/_catalog, since it works within the repo model. I think a top-level /v2/repositories/list request would make sense for certain registries but not for others. That would return e.g. only public repositories for dockerhub. GCR would probably reject it.

GET https://registry-1.docker.io/v2/repositories/list?n=100
200 OK
Content-Type: application/json
{
  "name": "",
  "repositories": [
    "library"
  ]
}

You can imagine all public repos being returned instead of just the official images, but that seems like something we'd want the registry operators to control since it would depend on the auth and namespace model.

2. returning a list of descriptors somewhere

I commented here that it would be cool if we extended /v2/.../tags/list to include a list of manifest descriptors contained by that repository.

That might be a bad idea for various reasons I hadn't considered, but we could also add a new endpoint, e.g. /v2/.../descriptors/list or something. Tags could be represented by annotations so that the return type would be a valid image index. One cool thing about this is that you'd be able to just "pull" the entire repository without much work. Another benefit is that we can reuse data structures for consuming this endpoint, but we don't have to force registries to recompute the hash on this faux image index, so pushing remains a cheap operation. (Recomputing the digest of the whole repo would be expensive for large repos, especially if we wanted to represent repos recursively with this method...)

For example:

GET https://registry-1.docker.io/v2/library/ubuntu/descriptors/list
{
  "schemaVersion": 2,
   "manifests": [{
      "digest": "sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210",       
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "size": 2035,
      "annotations": {
         "org.opencontainers.image.ref.name": "latest"
      }
   }]
}

Some drawbacks: pagination with this faux image index representation is kind of clunky, though we could do it in a similar way to tags/repositories...

@vsoch
Copy link
Contributor

vsoch commented Mar 7, 2019

For the repositories endpoint, I agree that it would need to be registry specific. Some might prefer to return all public, others that the specific user making the query is allowed to see.

Just to clarify - the repositories endpoint handles being given a particular organization (e.g., library) and then return the repos under it, or no organization (https://registry-1.docker.io/v2/repositories/list?n=100) and then returns the organizations / collections instead? Sounds like a scraper's dream :) What about more weird namespaces like <organization>/<one>/<two> - I remember one registry running into issues with Singularity because they had this more non-traditional pattern.

A concern with this endpoint is that it gives good reason to stress the API - people like myself that like to study software are going to scrape the heck out of it. I would say for larger servers that want to serve the endpoint and not be scraped, they could do something along the lines of what GitHub / StackOverflow does, and provide some BigQuery Table of data.

Would there be a next / previous in the responses (i.e. are they paginated?) Also,would it make sense to return a random order in case people do massive scraping, we don't all hit poor postgres at the top at the same time?

I missed the descriptors discussion - what is a descriptor, an image manifest with annotations?

Stepping out of details for a second - what are the goals of this endpoint? From a high level, it lets people interested in studying containers (via their manifests) find them more programatically. What else?

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Mar 7, 2019

Just to clarify - the repositories endpoint handles being given a particular organization (e.g., library) and then return the repos under it, or no organization (https://registry-1.docker.io/v2/repositories/list?n=100) and then returns the organizations / collections instead? Sounds like a scraper's dream :) What about more weird namespaces like // - I remember one registry running into issues with Singularity because they had this more non-traditional pattern.

Yes exactly, since repositories can be nested, you would be able to walk the repositories down to leaves. This is how GCR works today, but if you grafted both of these proposals on to the /tags/list/ response.

A concern with this endpoint is that it gives good reason to stress the API

In my experience, people are already stressing the API with /v2/_catalog, especially spinnaker :) this would allow a more targeted scraping, e.g. if I only care about library/ubuntu I can scrape that, instead of using /v2/_catalog and going from there.

Would there be a next / previous in the responses (i.e. are they paginated?)

Yes, see the Link header, this is similar to how /tags/list works now. In one example, there is a link included in the body, but that's not consistent in the spec, but I think the header is the canonical way to handle pagination here.

I missed the descriptors discussion

A descriptor is defined here. tl;dr, it's: digest, mediaType, size, urls, and annotations. These are used to describe content-addressable content so that clients can handle them as something other than an opaque blob. Manifest layers is a list of descriptors, and image index manfiests is a list of descriptors. Basically all the json structures are compositions of various descriptiors with extra fields for manifests and image indexes.

what are the goals of this endpoint

Exposing something like this solves two problems:

  1. Discovery of images that aren't tagged (e.g. old images) and their digests.
  2. Getting an index of a repository without making N + 1 requests (list tags + pull every tag).

There's not currently any way to ask the registry "tell me about everything in this repo", which would be solved by using both of these endpoints together.

@vbatts
Copy link
Member Author

vbatts commented Mar 8, 2019 via email

@vbatts
Copy link
Member Author

vbatts commented Mar 8, 2019 via email

@jonjohnsonjr
Copy link
Contributor

Do we define the token handshake anywhere in the spec or is that out of scope for the distribution spec?

Further, for a provided security token context, a way to list orgs you have access too?

Not sure what "orgs" would be, but if it's just the top-level repos, we could reuse this via: https://registry-1.docker.io/v2/repositories/list

For the actual scopes, I would imagine something like:

https://auth.docker.io/token?service=registry.docker.io&scope=repository:*:list

  • lists org you have read access to
  • * inspired by catalog

https://auth.docker.io/token?service=registry.docker.io&scope=repository:library:list

  • used to authenticate GET https://registry-1.docker.io/v2/library/repositories/list

https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/ubuntu:list

  • it's unclear if this is necessary; where are authentication boundaries? can clients reuse the token from the previous exchange?
  • this would be an empty set (there's no subrepos in dockerhub under library/ubuntu), would the token handshake fail?

This might be complicating things more but I'm trying to reuse existing patterns wherever we can.

@sajayantony
Copy link
Member

sajayantony commented Mar 9, 2019

Adding to this list -

Listing & Auth

  • ACR implements /v2/_catalog. Support removing ss long as there is a fair replacement.
  • /v2/<ns1>/<ns2>/repositories/list is a great for scoping.
  • Token flows honor the Docker Auth Token spec.
  • Does distribution include the spec for the transport auth?

PubSub

With the Artifact extension we are considering adding the config descriptor type as well to the payload so that consumers can filter events by artifact type. A helm chart update would be a webhook filtered by the helm type.

@yuwaMSFT2
Copy link

yuwaMSFT2 commented Mar 10, 2019

Some comments:

  1. _catalog API is actually being used more than many assume. For ACR we provided this API to be compatible with dockerhub; from our data, the usage is significant (but maybe because there is no good alternatives). Meanwhile when we first introduced MCR for Microsoft public images, we blocked the _catalog API; but later we got multiple customer requests to implement it since many depend on it. It's not so easy to just retire this API.

  2. I saw one previous comment to add /_list API at different levels. I feel this is good. It provides a unified way to provide list functionality: {repositoryname}/_list, {repositoryname}/manifests/_list, {repositoryname}/tags/_list. Actually it's just to add list API for each supported type of entities. Currently registry has 3 level of entities: repository/manifests/tags; in the future we may find more. To provide a unified way to do it is important. Also the underscore "" kind of make sure it doesn't break any existing scenarios (/repositories/list may not work for some registries, since a repository name can be "repository/list" in certain scenarios)

  3. The spec is only to define the contract. Whether it is performance or not is an implementation concern.

  4. Regarding auth token scope, it depends on if we want to separate the list capability from the pull capability. Current registry implementation mostly assume them to require the same capability, which may make sense (user needs to list repository and then do pull).

Actually these questions were raised at the beginning but the general consensus was that these were not in the scope of distribution spec. But entity list/search is an essential part of a practical registry provider. So it sounds good to have it either in distribution spec, or some additional spec (management spec just a wild guess?)

As a data point, ACR implements both the _catalog API (to be compatible with OSS docker registry), and the private set of _list API for each type of entity (repositories/manifests/tags).

@jonjohnsonjr
Copy link
Contributor

For ACR we provided this API to be compatible with dockerhub

Does dockerhub support catalog? Last I checked, it didn't.

the usage is significant (but maybe because there is no good alternatives)

This has been my observation as well.

@bsatlas
Copy link
Contributor

bsatlas commented Mar 11, 2019

@yuwaMSFT2 I think that's the problem. I like what you said earlier about an additional management spec:

Actually these questions were raised at the beginning but the general consensus was that these were not in the scope of distribution spec. But entity list/search is an essential part of a practical registry provider. So it sounds good to have it either in distribution spec, or some additional spec (management spec just a wild guess?)

@SteveLasker
Copy link
Contributor

SteveLasker commented Mar 12, 2019

To attempt to summarize the discussion above:
There are two APIs I believe we're discussing.

  • repo listing
  • tag listing

With various requirements, such as only returning the list of repos and tags the user has access to.

Use Cases

  • Tools like vulnerability scanners need a listing of the repos by which they need to scan. While they would like to keep current with an event based notification, they also need a repo and tag listing api to do the initial scan and periodic re-scans. The query is also time based, where they need to know the repos added since a given date:time.
  • Using a repo listing, they can list the existing repos within a registry, enabling the user to choose which they wish to configure scanning.
  • Some deployment tools look for the "newest" tag, and don't want to use a :latest tag metaphor. Being able to query the list of tags, in a given order, with a top capability allows them to get the newest tag for a given repo.
  • As new artifact types become common, the tag listing will need to identify the type.

Auth requirements

Users should only be able to see the repos and tags they have permission to. These include anonymous and RBAC scenarios. Since each cloud vendor has their own auth flows, I don't think it's reasonable to assume we can achieve a common auth flow. (As much as I wish we could, I'm just being a bit more pragmatic)
I'd suggest the spec should simply specify the listing MUST adhere to listing only repos and tags the user has access to. Simply listing a repo the user doesn't have access to can disclose information.
As @yuwaMSFT2 mentioned above, each cloud vendor also implements unique roles. Within ACR, we support push, pull and separate listing roles to avoid leaking information.

Repo listing

  • To support flexibility in how registries have implemented multi-tenancy, the repo listing API must support starting at a sub "namespace".
  • Having meta-data on the repo listing is interesting, although I don't know if it should be on the repo listing, or repo info API.
  • Ordering is important, as tools may want to know the newest list of repos to provide tools the ability to get the list of new tags for scanning. Or the oldest repos to support some sort of purging capability.
  • Ordering can be alphabetical, or date added. It likely needs to support "modified" to support listing the most recent repos that have new tags. This would support the scanning scenario where a scanner periodically does a verification to find the newest tags.
  • Ordering would support asc and desc
  • Paging to support large repo lists. Paging default size would be fun to see if we could agree on that :)
  • Possibly filtering repos, like those that are empty

Tag listing

  • Similar to repos, we'd need ordering of ascending and descending, based on alpha or dates
  • Paging and top as customers that automate builds can have thousands of tags
  • Top n, to get the "newest" tag

Performance

One should assume that any supported API will be abused. Whether the implementer decides to cache and support massive requests, or throttle seems like an implementer/vendor specific decision.

Details API

I like where @atlaskerr is going with the meta-data details. I struggle on whether this should be in the initial repo listing, or subsequent individual requests of each repo/tag. Although, queries that support repo/tag listing with basic filters is important to be useful.

Query semantics vs. basic filters

Having worked through OData and other query tools, I worry about how much burden we put on the user for the most basic scenarios. I'd hope we could construct a REST based API that had progressive disclosure of the complexity. A basic repo listing has default behavior of the top 100 repos, alphabetically listed.
/v2/<ns1>/<ns2>/repositories/list
for more complex
/v2/<ns1>/<ns2>/repositories/list?orderBy=createdDate&order=desc
The spec could say certain parameters are required, while registries could support additional parameters providing their unique values
/v2/<ns1>/<ns2>/repositories/list?orderBy=createdDate&order=desc&registrySpecific=foo

Eventing and Listing

Having an evening API is important to complete the scenario. The vulnerability scanners currently implement time based scheduling to attempt to keep up to date. However, each have asked for an eventing API to keep current. While this is likely another portion of the spec, having them keep in sync to provide a common experience would be helpful. It would also alleviate undue stress on one api vs. another to cover the scenarios as it allows tooling to use each for their value.

CLI

Another logical extension is a common CLI for registries. While we're discussing common REST APIs across registries, one of the big benefits is using a common CLI across registries. We all benefit from docker pull across each. Having an foo repo list api would be an interesting project. Starting with a common REST api could incubate some interesting innovations.

Next Steps

Do we have enough captured here to start a draft multi-page spec that we could put in a sub folder of distribution?

@vsoch
Copy link
Contributor

vsoch commented Mar 12, 2019

Is it just me, or does this smell a little bit like GitHub or GitLab API endpoints? For example, listing repositories:

# Github
GET /users/:username/repos

# Proposal
GET /v2/<ns1>/<ns2>/repositories/list

The main difference is just the use of "repos" vs "repositories" and the "list" is implied in the first. The maps to the users/:username.

It's similar to how (some / all of?) Docker's APIs were integrated into the image spec, no? Or more simply, wouldn't it be really powerful if we developed a spec for these additional endpoints so that already existing version control APIs would already be compliant? In the context of Github, this would mean that GitHub pages could serve a static registry and deliver the same interactions as with a container registry. If we add content types, then with a "doc" or "license" sort of type, this would link cleanly to the files in the repo.

@yuwaMSFT2
Copy link

The first (/repos) is a more common RESTful style API.
While the second one (/repositories/list) may not work in certain cases depending on how other APIs are designed (what if there is an existing repository called list?)

I would vote for the first:)

@jonjohnsonjr
Copy link
Contributor

@SteveLasker that seems like an astonishing amount of scope creep for a summary 😉

My main goal here is to drop /v2/_catalog and propose a palatable replacement for it that enables other systems to index the registry and provide search capabilities. As it stands, the registry is somewhat opaque in that you cannot list images that are not tagged. I'm not sure if that was intentional in its design.

If we can get:

  1. a registry that is fully indexable via the registry API, and
  2. a standard event payload (something something cloud events?),

then we could build most of what you're proposing around that, generically. I'm hesitant to add a ton of requirements to the registry spec because that will basically guarantee that most registries won't ever fully implement it.

Some deployment tools look for the "newest" tag, and don't want to use a :latest tag metaphor. Being able to query the list of tags, in a given order, with a top capability allows them to get the newest tag for a given repo.

That's horrifying.

Paging and top as customers that automate builds can have thousands of tags

Pagination for tag listing is already in the spec. Something like top seems useful, but I don't know if we want to include it in the spec.

Having an evening API is important to complete the scenario

Agreed. Ideally, well-behaved clients would do a full-resync once and listen for registry events to keep their index up to date. This is similar to how kubernetes informers behave.

CLI

There are a few registry CLIs already. I don't think we need to create an OCI-blessed CLI, but it should be easy to write a CLI from reading the spec.

Is it just me, or does this smell a little bit like GitHub or GitLab API endpoints?

I based it on the /tags/list endpoint already in the registry. I'm not tied down to any particular format, but I'd like to be at least self-consistent in the API.

this would mean that GitHub pages could serve a static registry and deliver the same interactions as with a container registry

I think this is going to be hard to achieve and maintain, since GitHub is free to change their API arbitrarily... so it might not be a great idea to tie the spec to whatever GitHub's API happens to be right now. (I love that you got the static registry stuff working, BTW.) What does the equivalent GitLab API look like? The same?

@ad-m
Copy link

ad-m commented Mar 13, 2019

GitHub pages could serve a static registry and deliver the same interactions as with a container registry

I think limiting yourself to GitHub Pages is not the right solution. First of all, there is a proprietary solution. Second, its usage limits (maximum size 1 Gb, monthly transfer of 100 GB, 10 updates per hour) can limit practical potential.

We can have statistically generated registers in mind and I like this idea. I notice that in the case of operating system repositories, for example, APT is not uncommon, they are statistically generated (see https://github.com/krobertson/deb-s3 for apt-repository on s3, https://tylerpower.io/post/hosting-yum-repo-on-s3/ for yum-repository on s3), and updates require refreshing of register indexes. After all, the repository reads more than writes to it, so the read operation should be optimized.

Thanks to the appropriate architecture in this area, operating system repositories have many mirrors ( https://www.debian.org/mirror/list ), and now - in the case of Docker - an unofficial mirror of an unofficial repository is something limited (https://docs.docker.com/registry/recipes/mirror/). I would like to draw attention to the arguments that were given in the case of abandoning one of the Linux kernel distribution protocols.

@vsoch
Copy link
Contributor

vsoch commented Mar 13, 2019

I don’t concretely mean that it would be limited to GitHub Pages, the idea that I’m trying to get across is that there are already APIs that exist to list repositories and projects. Instead of coming up with an entirely new one, we can use features from those APIs that have already been somewhat tested and known. This would mean that an already existing API (GitHub as the example with probably billions of repos) would then conform to our new specification. Sure, they could change in the future, but the incentive to do so might change if they know that their resource is friendly to OCI. If people start building things using them two? Then I suppose we’d start to see another company/ies representation at the meetings :)

@SteveLasker
Copy link
Contributor

SteveLasker commented Mar 13, 2019

an astonishing amount of scope creep for a summary 😉

Yeah, umm, I tend to work from a master plan approach, knowing where all the pieces could go, then scope back in incremental pieces. Starting smaller is goodness.
I mostly wanted to call out the auth issues and recognize different registries implemented namespaces differently, and is likely a good vendor differentiator we won't likely get agreement upon. If we can enhance the catalog like api to support listing from a sub namespace, we can likely find a good consistent place.

newest tag, "horrifying"

I had thought the same at first. But we've had developers want to deploy the latest/newest build to a dev environment. While they could pull the tag, based on a webhook, we got feedback they want an ordered tag listing. They also wanted ordered tag listings in other tools, like DevOps and App Services, where the user can choose a tag from a combo box. They wanted to get the same experience across Docker Hub and ACR. It would be great if a customer could choose from other registries they might happen to host with Azure as well.

top

Could be later, as long as paging has a reasonable, small page size

CLI
Agreed on it being an interesting incubation, not part of the spec.

tag listing API to support artifact type

I forgot to include the tag listing should support listing the artifactType, enabling tools to understand what the tag represents.

untagged/tag listing

Jon brought up an interesting reference to be able to list untagged manifests. There's a good discussion here, as well as possibly understanding the history of manifests a tag represented. When a stable tag of a base image is updated to reflect OS & FX patching, it's also interesting to know the previous manifest, in case a user must roll back.

@bsatlas
Copy link
Contributor

bsatlas commented Mar 13, 2019

Would the new catalog/listing operation be a required or optional endpoint?

@mikebrow
Copy link
Member

Collecting use cases, forming workgroup here: https://hackmd.io/s/BJPAUxDvV#OCI-Catalog-Listing-API---Workgroup

@vbatts
Copy link
Member Author

vbatts commented Dec 16, 2019

I am now looking forward to @josephschorr proposal on a pubsub event model.

@josephschorr
Copy link

Hoping to publish it for community review within a few weeks, as holidays adds some delays :)

@vbatts vbatts changed the title tease out the catalog API? [RFP] replace catalog API functionality Dec 20, 2019
@SteveLasker
Copy link
Contributor

Do we want to close this, and let Joey continue to make progress on the Pub/Sub model? I love what Joey is doing for the specific content update scenario. But, that's not the same for quick-hit scenarios where someone just needs to see a one-time listing of repos or tags.
Just suggest we close this one, and have separate PRs for new proposals. They can reference this for the history of the conversation if valuable.

@vbatts
Copy link
Member Author

vbatts commented Apr 1, 2020

@josephschorr are you waiting on something for the pub/sub events PR?

@mikebrow
Copy link
Member

mikebrow commented Apr 2, 2020

I would like to see pub/sub done in such a way so as to cover the one time listing of repos/tags with published updates to follow based on the subscription. Let's wait to close this till we have a resolution to the issue I think?

@josephschorr
Copy link

@vbatts I was hoping for some more feedback on my document before I opened it

@vbatts
Copy link
Member Author

vbatts commented Apr 2, 2020

@josephschorr ok. Let me get #111 shaped up then, and you can ready your PR

@vbatts
Copy link
Member Author

vbatts commented Jun 23, 2021

_catalog got removed from the final v1.0.0
Thanks for all the discussion. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests