Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove Stacks #167

Closed
wants to merge 5 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions text/0000-remove-stacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Meta
[meta]: #meta
- Name: Remove Stacks
- Start Date: 2021-06-16
- Author(s): sclevine
- RFC Pull Request: (leave blank)
- CNB Pull Request: (leave blank)
- CNB Issue: (leave blank)
- Supersedes: [RFC0069](https://github.com/buildpacks/rfcs/blob/main/text/0069-stack-buildpacks.md), many others

# Summary
[summary]: #summary

This RFC proposes that we remove the "stack" and "mixin" concepts from the project and replace them with existing constructs in the container image ecosystem such as base images, Dockerfiles, and OS packages. This RFC also introduces additional functionality for customizing base images, as an alternative to stackpacks.

# Motivation
[motivation]: #motivation

The "stack" and "mixin" concepts add unnecessary complexity to the project and make it difficult for new users and contributors to understand how buildpacks work. Compatibility guarantees that are strongly enforced by the stack contract could be replaced with metadata validations and warnings.

Removing these concepts and relying on Dockerfiles for base image generation and manipulation applies buildpacks to the problem that they solve best: managing application runtimes and dependencies.

# What it is
[what-it-is]: #what-it-is

Summary of changes:
- Replace mixins with a CycloneDX-formatted list of packages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add the link to the project you refer here. I suppose that it corresponds to : https://cyclonedx.org/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, will add a link in the smaller RFCs that will replace this one.

- Replace stackpacks with multi-purpose build-time and run-time Dockerfiles.
- Replace stack metadata (including stack IDs) with canonical OS metadata.
- Allow buildpacks to select a minimal runtime base image during detection.

# How it Works
[how-it-works]: #how-it-works

## Base Image Metadata

Instead of a stack ID, runtime and build-time base images are labeled with the following canonicalized metadata:
- OS (e.g., "linux", `$GOOS`)
- Architecture (e.g., "x86_64", `$GOARCH`)
- Distribution (optional) (e.g., "ubuntu", `$ID`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any cases where $ID_LIKE is useful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only find the combination of $ID and $VERSION_ID to be especially useful (for establishing ABI). I supposed $ID could be useful for knowledge of common tooling (e.g., apt), and sometimes that could work for $ID_LIKE as well? Maybe something we could add later?

- Version (optional) (e.g., "18.04", `$VERSION_ID`)

For Linux-based images, each field should be canonicalized against values specified in `/etc/os-release` (`$ID` and `$VERSION_ID`).

The `stacks` list in `buildpack.toml` is replaced by a `platforms` list, where each entry corresponds to a different buildpack image that is exported into a [manifest index](https://github.com/opencontainers/image-spec/blob/master/image-index.md). Each entry may contain multiple valid values for Distribution and/or Version, but only a single OS and Architecture.

`buildpack.toml` no longer contains OS package information. Buildpacks may express runtime package dependencies during detection (see "Runtime Base Image Selection" below).

App image builds fail if the build image and selected run image have mismatched metadata. We may consider introducing a flag to skip this validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming "mismatched metadata" only applied to things like Architecture and Distribution, given we typically expect mismatched packages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also curious about what happens to stacks like io.paketo.stacks.tiny here. The build image is an ubuntu distribution but the run image isn't (although it is derived from an ubuntu distribution, I believe).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to strike this requirement for $ID and $VERSION_ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the wording in #172 so that tiny can leave off the distro/version to be compatible with all distros/versions. Also mentioned that flags/labels could be used to skip validation in the future.


When an app image is rebased, `pack rebase` will fail if the new run image and previous run image have mismatched metadata. This check may be skipped for Distribution and Version by passing a new `--force` flag to `pack rebase`.

## Mixins

The mixins label on each base image is replaced by a layer in each base image containing a single file consisting of a CycloneDX-formatted list of packages. Each package entry has a [PURL](https://github.com/package-url/purl-spec)-formatted ID that uniquely identifies the package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for run images too? How would the lifecycle get this information for selecting the run image?

Think it's here:

the output replaces the label io.buildpacks.sbom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the label is supposed to be a reference to the layer that contains the SBoM. I should make that more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sold on combining the stack SBoM and packages list. I know the SBoM by definition does include the packages but it may also include a lot of other informaion (provenance, licenses) and it could be useful to pull out the piece that is required for validation into a more easily consumable format (and one that is less likely to change if for example we switch from cycloneDX to SPDX for the BOM)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be useful to pull out the piece that is required for validation into a more easily consumable format

Why should we implement logic to transform the data into a different format and put both formats on the image? We could just use one (ideally standardized, non-CNB-specific) format, and transform it when we need to validate.

one that is less likely to change if for example we switch from cycloneDX to SPDX for the BOM

If we commit to a format and change it, we're going to have to update the lifecycle to parse it regardless.


### Validations

Buildpack base image metadata specified in `buildpack.toml`'s `platforms` list are validated against the runtime and build-time base images.

Runtime and build-time base image packages are no longer validated against each other.

When an app image is rebased, `pack rebase` will fail if packages are removed from the new runtime base image. This check may be skipped by passing a new `--force` flag to `pack rebase`.

## Runtime Base Image Selection
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekcasey Re: your comment on runtime base image selection and app-specified Dockerfiles not playing well together (i.e., app-specified Dockerfiles can't fulfill package requirements from buildpacks): what if we allow users to label the Dockerfiles with packages names (in version-less PURL format) that could be matched against (and thus remove) buildpacks-required packages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that the "label" would be something like a comment at the top of the Dockerfile, not an image label.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we allow users to label the Dockerfiles with packages names (in version-less PURL format) that could be matched against (and thus remove) buildpacks-required packages?

Hmm, this fills the required purpose but it seems like it moving away from the simplicity of "just a Dockerfile" towards something that more closely resembled a list of "provided mixins"? I need to chew on this a little more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, feels bad to me also


Builders may specify an ordered list of runtime base images, where each entry may contain a list of runtime base image mirrors.

Buildpacks may specify a list of package names (as a PURL URL without a version or qualifiers) in a `packages` table in the build plan.

The first runtime base image that contains all required packages is selected. When mirrors are present, the runtime base image mirror matching the app image is always used, including for package queries.

## Dockerfiles

Note: kaniko, BuildKit, and/or the original Docker daemon may be used to apply Dockerfiles at the platform's discretion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "at the platform's discretion" mean that a platform can provide whatever mechanism it wants for buildpack users to select/provide Dockerfiles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement was only intended to suggest that the underlying technology for applying Dockerfiles is up to the platform. E.g., BuildKit if you're using the BuildKit frontent, kaniko if you're using kpack or tekton, etc.


### App-specified Dockerfiles

A buildpack app may have a build.Dockerfile and/or run.Dockerfile in its app directory. A run.Dockerfile is applied to the selected runtime base image after the detection phase. A build.Dockerfile is applied to the build-time base image before the detection phase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we're specing, or is this a platform detail of Pack?

I'd like to see these build/run Dockerfiles defined in project.toml.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @jabrown85's idea of putting all Dockerfile-related functionality into an extension spec.

Do you mean the locations may be overriden in project.toml, or are you thinking inline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of overriding the location, but now I'm interested in inline too


Both Dockerfiles must accept `base_image` and `build_id` args. The `base_image` arg allows the lifecycle to specify the original base image. The `build_id` arg allows the app developer to bust the cache after a certain layer and must be defaulted to `0`.
Copy link
Contributor

@jabrown85 jabrown85 Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does lifecycle pass into base_image? The examples below specify LABEL io.buildpacks.image.distro=ubuntu in a run.Dockerfile but I would have thought those would exist on the base_image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of base_image is always the original image that needs to be extended. The run.Dockerfile you're referencing below would be used to create a stack from something like ubuntu:bionic. A command like pack create-stack could take run.Dockerfile and build.Dockerfile, perform a normal docker build, and then validate that all required fields/files are present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at

Note: kaniko, BuildKit, and/or the original Docker daemon may be used to apply Dockerfiles at the platform's discretion.

and also

allows the lifecycle to specify the original base image

I wonder if maybe it's worth clarifying further how this would work. I'm assuming for the build image, the lifecycle could use kaniko during the existing build phase. But extending the run image would imply a new phase...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build_id arg allows the app developer to bust the cache after a certain layer and must be defaulted to 0.

Could you describe a bit further how this would work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the examples below -- if you use build_id in a RUN instruction, that layer and all layers under it will never be cached due to the value changing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming for the build image, the lifecycle could use kaniko during the existing build phase. But extending the run image would imply a new phase...

This is what I'm thinking as well. For pack, this phase could happen in parallel with the builder phase. Happy to add more detail.

Copy link

@cmoulliard cmoulliard Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kaniko

Kaniko will imply to run the build process in a docker container or kubernetes pod. This is not needed using Google JIB - https://github.com/GoogleContainerTools/jib or buildah - https://github.com/containers/buildah

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tools all use different approaches:

  • JIB doesn't require containers at all, but it's specific to JVM-based apps.
  • Kaniko can be used as a library and invoked within an existing container (entirely in userspace, like JIB).
  • Buildah requires either containers or fuse/chroot.

For in-cluster builds, kaniko's approach is least-privileged. For local builds, Docker/BuildKit (or buildah on Linux) all seem like good options.

Happy to remove or extend the list of suggested technologies.


A runtime base image may indicate that it does not preserve ABI compatibility by adding the label `io.buildpacks.unsafe=true`. Rebasing an app with this label requires passing a new `--force` flag to `pack rebase`.

### Platform-specified Dockerfiles

The same Dockerfiles may be used to create new stacks or modify existing stacks outside of the app build process. For both app-specified and stack-modifying Dockerfiles, any specified labels override existing values.

Dockerfiles that are used to create a stack must create a `/cnb/stack/genpkgs` executable that outputs a CycloneDX-formatted list of packages in the image with PURL IDs when invoked. This executable is executed after any run.Dockerfile or build.Dockerfile is applied, and the output replaces the label `io.buildpacks.sbom`. This label doubles as a Software Bill-of-Materials for the base image. In the future, this label will serve as a starting point for the application SBoM.
Copy link

@fg-j fg-j Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the validity of this binary be assured? Is this binary published by the project and included (by the platform?) in Dockerfile-extended stacks? Is it the responsibility of the Dockerfile writer to create (or validate) their own binary? I worry about a malicious binary that produces false information about the packages contained in a build/run image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary would be root-owned and added by the Dockerfiles that created the build-time and runtime base images. The distribution-specific logic in the binary could be implemented for common distros by the CNB project.

Given that all Dockerfiles can run as root, they must all be fully-trusted. If an untrusted Dockerfile is allowed to run, it could cause the binary to produce false information without touching the binary itself (e.g., via LD_PRELOAD, or by modifying the package DB). It's up to Dockerfile authors to ensure supply chain integrity for any components they add.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach proposed here which I suppose will rely on docker build or podman build will only work using docker locally as it uses by default the root user but not at all if the image is build using a pod (kubernetes, openshift) as a random or fix UID which is not root is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this might not be necessary as the buildpacks lifecycle could use kaniko to execute the Dockerfile in the context of the build or run image (see #167 (comment) ). @sclevine is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, kaniko can be used for in-cluster builds. The lifecycle already has build-time phases that require in-container root.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the executable genpkgs become part of the Lifecycle ? Will it be called during bin buildpack step to install a package and set the $PATH of the package (e.g. maven, jdk, ...) ?


### Examples

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create, part of a github repo a more concrete example staring from an existing Buildpack and stacks and how it could be converted into Buildpacks DockerStacks tree of files ?


run.Dockerfile used to create a runtime base image:

```
ARG base_image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a plain Dockerfile with FROM ubuntu:18.04?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but that would work also. The idea is to use the same format for all stack-related Dockerfiles (creating, pre-build extension, at-build extension).

FROM ${base_image}
ARG build_id=0

LABEL io.buildpacks.image.distro=ubuntu
LABEL io.buildpacks.image.version=18.04
Comment on lines +98 to +99
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be derived automatically from /etc/os-release when present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would be responsible for adding this label? Would the lifecycle add it to the exported image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pack create-stack could add it if not set already


ENV CNB_USER_ID=1234
ENV CNB_GROUP_ID=1235

RUN groupadd cnb --gid ${CNB_GROUP_ID} && \
useradd --uid ${CNB_USER_ID} --gid ${CNB_GROUP_ID} -m -s /bin/bash cnb

USER ${CNB_USER_ID}:${CNB_GROUP_ID}

COPY genpkgs /cnb/stack/genpkgs
```

run.Dockerfile present in an app directory that always installs the latest version of curl:
```
ARG base_image
FROM ${base_image}
ARG build_id=0

RUN echo ${build_id}

RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
```

Unsafe run.Dockerfile present in an app directory:
```
ARG base_image
FROM ${base_image}
ARG build_id=0

LABEL io.buildpacks.unsafe=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case for adding this label any time the run image is extended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Flipped the label to io.buildpacks.rebasable. Not sure if it should be inherited though.


RUN curl -L https://example.com/mypkg-install | sh
```

# Drawbacks
[drawbacks]: #drawbacks

- Involves breaking changes.
- Buildpacks cannot install OS packages directly, only select runtime base images.

# Alternatives
[alternatives]: #alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's also a variant of stackpacks where we aren't so strict as the current RFC. All the complexity came in when we tried to put guardrails around them and ensure rebase always worked.

The Dockerfiles in this proposal could be easily replaced with a stackpack that's just a bin/detect and bin/build and no guarantees about rebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add this as an alternative.

I think our mistake is larger than trying to preserve rebase though. As mentioned in #167 (comment), I think stackpacks leave us on the hook to solve a hard problem. Even if we don't break rebase, how are we going to ensure that packages stay up-to-date? I'd rather implement the existing solution (and associated set of user expectations) first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree the problem is larger (rebase is just an example here).

My original vision for Stackpacks was something dead simple: a type of buildpack that runs as root. I don't think that's much different than a Dockerfile. If we try to attach a bunch of guardrails/constraints/etc we'll probably end up in the same spot.

That said, I think the original very simple Stackpacks concept could co-exist with the Dockerfile mechanism.


- Stackpacks
- Keep stacks & mixins, but implement "Dockerfiles"
- Ditch stacks & mixins, but skip "Dockerfiles"

# Unresolved Questions
[unresolved-questions]: #unresolved-questions

- Should we use the build plan to allows buildpacks to specify package requirements? This allows, e.g., a requirement for "python" to be satisfied by either a larger runtime base image or by a buildpack. Opinion: no, too complex and difficult to match package names and plan entry names, e.g., python2.7 vs. python2 vs. python.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. this is where stackpacks got messy. I don't think stackpacks themselves where the problem, but rather all the stuff like this that we tacked on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the current proposal doesn't shut the door to doing something like that in the future, right? If so maybe we could revisit this question when we find that we need it.

- Should packages be determined during the detect or build phase? Opinion: detect phase, so that a runtime base image's app-specified Dockerfiles may by applied in parallel to the buildpack build process.

# Spec. Changes (OPTIONAL)
[spec-changes]: #spec-changes

This RFC requires extensive changes to all specifications.