The Stackage data flow

The Stackage project is really built on top of a number of different subcomponents. This page covers how they fit together. The Stackage data flow diagram gives a good bird's-eye view:

Inputs

There are three inputs into the data flow:

Hackage is the upstream repository of all available open source Haskell packages that are part of our ecosystem. Hackage provides both cabal file metadata (via the 00-index.tar file) and tarballs of the individual packages.
build-constraints.yaml is the primary Stackage input file. This is where package maintainers can add packages to the Stackage package set. This also defines upper bounds, skipped tests, and a few other pieces of metadata.
stackage-content is a Github repository containing static file content served from stackage.org

Travis

For various reasons, we leverage Travis CI for running some processes. In particular:

all-cabal-files clones all cabal files from Hackage's 00-index.tar file into a Git repository without any modification
all-cabal-hashes is mostly the same, but also includes cryptographic hashes of the package tarballs for more secure download (as leveraged by Stack. It is powered by all-cabal-hashes-tool
all-cabal-packages uses hackage-mirror to populate the hackage.fpcomplete.com mirror of Hackage, which provides S3-backed high availability hosting of all package tarballs
all-cabal-metadata uses all-cabal-metadata-tool to query extra metadata from Hackage about packages and put them into YAML files. As we'll see later, this avoids the need to make a lot of costly calls to Hackage APIs

Travis does not currently provide a means of running jobs on a regular basis. Therefore, we have a simple cron job on the Stackage build server that triggers each of the above builds every 30 minutes.

stackage-curator

The heart of running Stackage builds is the stackage-curator tool. We run this on a daily basis on the Stackage build server for Stackage Nightly, and on a weekly basis for LTS Haskell. The build process is highly automated and leverages Docker quite a bit.

stackage-curator needs to know about the most recent versions of all packages, their tarball contents, and some metadata, all of which it gets from the Travis-generated sources mentioned in the previous section. In addition, it needs to know about build constraints, which can come from one of two places:

When doing an LTS Haskell minor version bump (e.g., building lts-5.13), it grabs the previous version (e.g., lts-5.12) and converts the previous package set into constraints. For example, if lts-5.12 contains the package foo-5.6.7, this will be converted into the constraint foo >= 5.6.7 && < 5.7.
When doing a Stackage Nightly build or LTS Haskell major version bump (e.g., building lts-6.0), it grabs the latest version of the build-constraints.yaml file.

By combining these constraints with the current package data, stackage-curator can generate a build plan and check it. (As an aside, this build plan generation and checking also occurs every time you make a pull request to the stackage repo.) If there are version bounds problems, one of the Stackage curators will open up a Github issue and will add upper bounds, temporarily block a package, or some other corrective action.

Once a valid build plan is found, stackage-curator will build all packages, build docs, and run test suites. Assuming that all succeeds, it generates some artifacts:

Uploads the build plan as a YAML file to either stackage-nightly or lts-haskell
Uploads the generated Haddock docs and a package index (containing all used .cabal files) to haddock.stackage.org.

stackage-server-cron

On the Stackage build server, we run the stackage-server-cron executable regularly, which generates:

A SQLite database containing information on snapshots, the packages they contain, Hackage metadata about packages, and a bit more. This database is uploaded to S3.
A Hoogle database for each snapshot, which is also uploaded to S3

stackage-server

The software running stackage.org is a relatively simple Yesod web application. It pulls data from the stackage-content repo, the SQLite database, the Hoogle databases, and the build plans for Stackage Nightly and LTS Haskell. It doesn't generate anything important of its own except for a user interface.

Stack

Stack takes advantage of many of the pieces listed above as well:

It by default uses the all-cabal-hashes repo for getting package metadata, and downloads package contents from the hackage.fpcomplete.com mirror (using the hashes in the repo for verification)
There are some metadata files in stackage-content which contain information on, for example, where to download GHC tarballs from to make stack setup work
Stack downloads the raw build plans for Stackage Nightly and LTS Haskell from the Github repo and uses them when deciding which packages to build for a given stack.yaml file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA-FLOW.md

DATA-FLOW.md

The Stackage data flow

Inputs

Travis

stackage-curator

stackage-server-cron

stackage-server

Stack

Files

DATA-FLOW.md

Latest commit

History

DATA-FLOW.md

File metadata and controls

The Stackage data flow

Inputs

Travis

stackage-curator

stackage-server-cron

stackage-server

Stack