Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc(pkg): Explain package management #10950

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions doc/explanation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ the rest of the OCaml ecosystem.

preprocessing
ocaml-ecosystem
package-management
opam-integration
bootstrap
mental-model
Expand Down
226 changes: 226 additions & 0 deletions doc/explanation/package-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# How Package Management Works

This document gives an explanation on how the new package management
feature introduced in Dune works under the hood. It requires a bit of
familiarity with how opam repositories work and how Dune builds packages. Thus
it is aimed at people who want to understand how the feature works, not how it
is used.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document gives an explanation on how the new package management
feature introduced in Dune works under the hood. It requires a bit of
familiarity with how opam repositories work and how Dune builds packages. Thus
it is aimed at people who want to understand how the feature works, not how it
is used.
This document explains how Dune's package management works
under the hood. It requires a bit of familiarity with how opam
repositories work and how Dune builds packages. Thus it is aimed at people
who want to understand how the feature works, not how it is used.

A bit terser, and without the "new" of "new package management feature", as these qualifiers tend to get stale.


For a tour on how to apply package management to a project, refer to the
{doc}`/tutorials/dune-package-management/index` tutorial.

## Motivation

A core part of modern programming is using existing code to save time.
The OCaml package ecosystem has quite a long history with many projects
building upon each other over many years. A significant step forward was the
creation of the OCaml Package Manager, opam, along with the establishment of a
public package repository which made it a lot more feasible to share code
between people and projects.

Over time, best practices have evolved, and while opam has incorporated some
changes, it couldn't adopt all the modern workflows due to its existing user base and constraints.

Dune Package Management attempts to take the parts of the opam ecosystem that
have stood the test of time and couple them with a modern workflow. Some of the
improvements include:

* Automatic package repository updates
* Easily reproducible dependencies
* All package dependencies declared in a single file that is kept in-sync
* Per-project dependencies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what's new in that list from what opam can already do (apart from automatic package repository updates, which can be seen as quite bad from a reproducibility point of view if you don't have a lockfile).

Instead of comparing opam vs. dune pkg, I would stress the design principle. A random list of stuff:

  • making the OCaml environment setup trivial;
  • promoting a lockfile-first approach to address reproducibility use cases;
  • unifying the configuration files (and removing hidden global states managed by a CLI tool);
  • improving cross-packages and vendoring workflows;
    and much more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the central change could be explained separately from the consequences of that change. Say:

The central change in dune's package management is the idea that all information necessary to build a repository lives in the repository, not in unversioned state like opam switches, whether global or repo-local. This is on par with what happens in other language ecosystems, and has the following beneficial properties:

  • excellent support for reproducible builds
  • building a project using this tooling is just dune build, no knowledge necessary
  • etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a few principles that I've had in mind when designing this feature:

  1. No global state visible to the user
  2. All package configuration must be done via config file updates (dune-project or dune-workspace). No stateful commands such as opam pin.
  3. "Automatic` package repository updates" also applies to pins. For example, we re-fetch branches & tags.
  4. Package builds can only access packages which are listed as dependencies.
  5. Users do not need to learn yet another file format to configure their workspace. Everything should be doable via workspaces.
  6. Build plans are independently versioned so that they can be interpreted in the same way between different versions of dune


Dune plays well with the existing OCaml ecosystem and does not introduce a new
type of packages. Rather, it uses the same package repository and Dune packages stay
installable with opam.

## Package Management in a Project

This section describes what happens in a Dune project using the package
management feature.

## Dependency Selection

The first step is to determine which packages need to be installed.
Traditionally this has been defined in the `depends` field of a projects opam
file(s).

Since a while Dune has also supported {doc}`opam file generation
</howto/opam-file-generation>` by specifying the package dependencies in the
`dune-project`. Outside of this feature, Dune had not used the `depends` stanza.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved

The package management feature changes this, as Dune now determines the list of
packages to install from the `depends` stanza in the `dune-project` file. This
allows projects to completely omit generation of `.opam` files, as long as they
use Dune for package management. Thus all dependencies are only declared in one
file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick remark: if you are not generating .opam files anymore, then your package is no longer pinnable by opam. There's an issue to fix this in the opam tracker (ie. allow on-the-fly generating of opam files).

Comment on lines +53 to +57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The package management feature changes this, as Dune now determines the list of
packages to install from the `depends` stanza in the `dune-project` file. This
allows projects to completely omit generation of `.opam` files, as long as they
use Dune for package management. Thus all dependencies are only declared in one
file.
Dune with package management instead computes the list of
packages to install from the `depends` stanza in the `dune-project` file. This
allows projects to completely omit generation of `.opam` files, as long as they
use Dune for package management. Thus all dependencies are only declared in one
file.

I don't think the last sentence is right though.
Dependencies are declared both in dune files and in dune-project. And usually the opam file would be generated from the dune-project regardless, so the duplication is not necessarily user-facing.


For compatibility with a larger amount of existing projects, Dune will also
collect dependencies from `.opam` files in the project. So while recommended,
there is no obligation to switch to declaring dependencies in the
`dune-project`. Likewise the generation of `.opam` files will still work.
Comment on lines +59 to +62
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like user-facing doc, doesn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the user docs pointed to by https://dune.ci.dev/ only say that the dependencies are specified in the dune-project file. If the opam file is possible as well, maybe the user docs is where this information should live.


## Locking

To go from a project's set of dependency constraints to a set of installed
packages and versions, there needs to be a step to determine the right packages
and their versions to be installed.

In `opam`, this process happens as part of `opam install`, which links finding
a solution that satifies the given constrains and installation into one step.
Dune on the other hand separates the steps of finding a solution and installing.
First a solution is found and then packages are installed.

The idea of finding a solution and recording it for later is popular in other
programming language package managers like NPM and is usually called locking.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved

:::{note}
`opam` also supports creating lock files. However, these are not as central to
the opam workflow as they are in the case of package management in Dune, which
always requires a set of locked packages.
:::

In the most general sense, a package lock is just a set of specific packages and
their versions to be installed.

A Dune lock file extends this to a directory with files that describe the
dependencies. It includes the package's name and version. Unlike many
other package managers, the files include a lot of other information as well,
such as the location of the source archives to download (since there is no
central location for all archives), the build instructions (since each package
can use its own way of building), and additional metadata like the system
packages it depends upon.

The information is stored in a directory (`dune.lock` by default) as separate
files, because that makes them easier to manage in source control as it
leads to fewer potential merge conflicts and simplifies review processes.
Storing additional files like patches are also more elegant this way.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The information is stored in a directory (`dune.lock` by default) as separate
files, because that makes them easier to manage in source control as it
leads to fewer potential merge conflicts and simplifies review processes.
Storing additional files like patches are also more elegant this way.
The information is stored in a directory (`dune.lock` by default) as separate
files, to reduce potential merge conflicts and simplify code review.
Storing additional files like patches are also more elegant this way.

Terser suggestion.


### Package Repository Management

To find a valid solution that allows a project to be built, it is necessary to
know what packages exist, what versions of these packages exist, and what other
packages these depend on, etc.

In opam, this information is tracked in a central repository called
[`opam-repository`](https://github.com/ocaml/opam-repository), which contains all
the metadata for published packages.

It is managed using Git, and opam typically uses a snapshot to find the
dependencies when searching for a solution that satisfies the constraints.

Likewise, Dune uses the same repository; however, instead of file snapshots,
it uses the Git repository directly. In fact, Dune maintains a shared
internal cache containing all Git repositories that projects use. The
advantage is that updates to the metadata are very fast because only the newest
revisions have to be retrieved. The downside is that for the creation of the
cache, the entire repository has to be cloned first.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so if I depend on, say, js_of_ocaml, dune clones the whole js_of_ocaml repository? It doesn't grab the release url?
Or are you talking only about metadata repositories?

Copy link
Collaborator

@maiste maiste Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the metadata repository here: the opam-repository and the overlays (pretty small one).

Copy link
Collaborator Author

@Leonidas-from-XIV Leonidas-from-XIV Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the "package repository management" section, so I hoped it was clear that it is about the opam metadata.

It will normally use the js_of_ocaml release tarball if that's what your lockfile refers to. But if you use a pin to specify the jsoo git repo it will also add that repo to the git revision cache (thus multiple projects doing this will not have to clone jsoo twice).


Given a priorisation of fast updates, whenever Dune needs to determine the
available packages, it will update the repository first. Thus each locking
process by default will use the newest set of packages available.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved

However, it is also possible to specify specific revisions of the repositories,
to get a reproducible solution. Due to using Git, any previous revision of the
repository can be used by specifying a commit hash.
Comment on lines +129 to +131
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is confusing, I'm not sure what this is saying. Maybe this?

Suggested change
However, it is also possible to specify specific revisions of the repositories,
to get a reproducible solution. Due to using Git, any previous revision of the
repository can be used by specifying a commit hash.
Instead of specifying a package from the metadata repository, it is also possible to specify a git url + git hash.

Or is it talking about metadata repositories, maybe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is in the context of opam-repository so metadata repository or "package repository".


Dune defines two repositories by default:

* `upstream` refers to the default branch of `opam-repository`, which
contains all the publically released packages.
* `overlay` refers to
[opam-overlay](https://github.com/ocaml-dune/opam-overlays), which defines
packages patched to work with package management. The long-term goal is to
have as few packages as possible in this repository as more and more packages
work within Dune Package Management upstream. Check the
[compatibility](#compatibility) section for details.

### Solving

After Dune has retrieved the constraints and the set of possible packages, it is
necessary to determine which packages and versions should be selected for the
package lock.

To do so, Dune uses
[`opam-0install-solver`](https://github.com/ocaml-opam/opam-0install-solver),
which is a variant of the `0install` solver to find solutions for opam packages.

Contrary to opam, the Dune solver always starts from a blank slate. It
assumes nothing is installed and everything needs to be installed. This has the
advantage that solving is now simpler, and previous solver solutions don't interfere
with the current one. Thus, given the same inputs, it should always have the same
outputs; no state is held between the solver runs.

This can lead to more packages being installed (as opam won't install new package versions
by default if the existing versions satisfy the
constraints), but it avoids interference from already installed packages that
lead to potentially different solutions.

After solving is done, the solution gets written into the lock directory with
all the metadata necessary to build and install the packages. From this
point on, there is no need to access any package repositories.

:::{note}
Solving and locking does not download the package sources. These are downloaded
in a later step.
:::

## Building

When building, Dune will read the information from the lock directory and set
up rules to use the packages. Check {doc}`/explanation/mental-model` for
details about rules.

The rules that the package management sets up include:

* Fetch rules to download the sources as well as any additional sources like
patches and unpack them
* Build rules to evaluate the build instructions from the build instructions
stored in the lock directory
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved
* Install rules to put the artifacts that were built into the appropriate
Dune-managed folders

Creating these processes as rules mean that they will only be executed on
demand, so if the project has already downloaded the sources, it does not
need to download them again. Likewise, if packages are installed, they stay
installed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Creating these processes as rules mean that they will only be executed on
demand, so if the project has already downloaded the sources, it does not
need to download them again. Likewise, if packages are installed, they stay
installed.
Executing these steps as build rules allow them to be run on-demand and cached, even across projects. So building a fraction of a project requires building only the necessary dependencies. And if two repositories have some dependencies in common, their common dependencies will only be downloaded and built once, not twice.

I think this is true, considering the shared-cache is enabled, right? Because that seems to contradict the next paragraph below.


The results of the rules are stored in the project's `_build` directory
and managed automatically by Dune. Thus, when cleaning the build directory, the
installed packages are cleaned as well and will be reinstalled at the next
maiste marked this conversation as resolved.
Show resolved Hide resolved
build.

When building the users project, the installed packages are added to the
necessary search paths, so user code can use the dependencies without any
additional configuration.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved

(compatibility)=
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved
## Packaging for Dune Compatibility

Dune can build and install most packages as dependencies, even if they are not
built with Dune themselves. Dune will execute the build instructions from the
lock file, very similar to opam.

However, packages must meet certain requirements to be compatible with Dune.

The most important one is that the packages must not use absolute paths to
refer to files. That means, they cannot read the path they are being built or
installed into and expect this path to be stable. Dune builds packages in a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
refer to files. That means, they cannot read the path they are being built or
installed into and expect this path to be stable. Dune builds packages in a
refer to files. This means they cannot rely on the path they are being built or installed in, as it may not remain consistent. Dune builds packages in a

Is this still accurate? If so, it improves readability and clarity. If not, let's work on clarifying it together.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it a bit to remove the commas. I don't think "consistent" is the right word in the context as that would imply the path is "inconsistent" but that doesn't make that much sense. I've paraphrased it to be "stay the same" which might be simpler to understand. WDYT?

sandbox location, and after the build has finished, it moves the files to the
actual destination.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a good place to explain the difference between Dune and Opam sandboxing models?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is that we build in a different path than the one we install to. As far as I know the fact that we don't wrap things in bwrap/sandbox-exec are not design decisions, just something we haven't implemented yet. It is good idea and shouldn't be too difficult, especially with opam having made sure that packages generally work in its sandbox.


The reason for this is clear. On one hand it enables building without messing up the
current state, and on the other hand it allows for caching artifacts across
projects.

To sidestep these restructions in many cases the solution is to use relative
paths, as Dune guarantees that packages installed into different sections are
installed in a way where their relative location stays the same.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved

A minor difference is that Dune does not support packages installing themselves
maiste marked this conversation as resolved.
Show resolved Hide resolved
into the standard library, thus being available without having to be declared a
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved
dependency.

For this reason, the `overlay` repository exists, which contains packages where
the upstream packages are incompatible with Dune package management but were
patched to work in Dune.
Leonidas-from-XIV marked this conversation as resolved.
Show resolved Hide resolved
Loading