Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX for constraining build-time and runtime dependencies #29

Open
rgommers opened this issue Jan 21, 2022 · 79 comments
Open

UX for constraining build-time and runtime dependencies #29

rgommers opened this issue Jan 21, 2022 · 79 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@rgommers
Copy link
Contributor

There is a practical UX problem with pyproject.toml that we need to solve. A short description would be:

  • Any project that builds against the numpy C API has to deal with API and ABI compatibility (see the numpy docs on that for details).
  • as a consequence, the dependency specification for local development is different from the one needed for uploading an sdist to PyPI:
    • for local development, you want loose dependencies, e.g. numpy >= 1.19.5 (both a build and runtime dep)
    • in sdist released to PyPI you want: oldest-supported-numpy (build dep metapackage, contains == pins per platform) and numpy>='buildtime_dep,<1.25.0' # should be N+3whereN is current minor version.

That is obviously a complex set of dependencies to express in pyproject.toml, and we can't have two sets in there at the same time. The solution @FFY00 and I just discussed is:

  1. On the main branch, use the loose development dependencies (numpy>=1.19.5)
  2. In the CI job(s) that produce artifacts for release to PyPI, ensure that the lowest supported dependencies (which may be platform-dependent) are installed.
  3. In the release branch, replace the dependencies with the build-time deps needed for uploading to PyPI.
  4. Have support in mesonpy for the various kinds of pinnings people need for releases and API/ABI compatibility.

A rough sketch of how (4) could look:

[build-system]
requires = [
  "numpy >= 1.19.5",  # replace with (e.g.) `oldest-supported-numpy` in a release branch
]

[tool.mesonpy]
runtime_pinnings = [
  # if build version detected is 1.20.3, replace with `>=1.20.3` (while preserving the upper bound,
  # so end result would be for example `'>=1.20.3,<1.25.0')
  "numpy = 'pin_compatible'",
  "pythran = 'bugfix_only'",  # if build version detected is 0.10.0, override the build dependency with `'>=0.10.0,<0.11.0'`
  "torch = 'exact'",  # runtime version must equal build-time version (PyTorch has no ABI stability guarantees)
]

Note that this feature is needed for some other packages than just NumPy too (basically every package offering a C/C++ API has to deal with this), but NumPy alone is important enough - every package containing even a single Cython extension which uses NumPy has to deal with this. However, it's probably still a bit too specific to want native support in pyproject.toml via a standard that all tools must implement. Hence the choice to put this in mesonpy.

@rgommers rgommers changed the title UX for pinning build-time dependencies UX for constraining build-time and runtime dependencies Jan 21, 2022
@FFY00
Copy link
Member

FFY00 commented Jan 24, 2022

I have let this sink in a bit more and thought about it.

The only slight worry I have with this approach, is that the wheels will not be necessarily reproducible. I don't think it's a big issue, as anyone serious about reproducible wheels should be reconstructing the original build environment anyway1, and this won't be in issue there. Still, I feel very slightly uneasy about the built artifacts being highly dependent on the environment ended up being resolved by Pep 517 builders. Just something worth nothing IMO.

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.


The main design decision here would be how to specify the locked version, I would really like to gather some feedback from other people here.

A similar problem, and its solution, can be found at https://www.python.org/dev/peps/pep-0440/#compatible-release.

My initial proposal would be:

...

[tool.mesonpy.pin-build-dependencies]
numpy = '*.*.*'

And the tricky part, the patterns:

  1. Simple approach:
    Use a placeholder character, like * (eg. *.*.*), other option would be ^ (eg. ^.^.^).

  2. Full-on regex:
    Allow users to specify regex patterns, which will be run against the version string of the dependency (eg. ^[0-9]*.[0-9]*.[0-9]).

All the examples would pin the first three version number (eg. for 1.2.3.4.5 we would lock 1.2.3).

Anyway, none of this is trivial to implement, and there isn't much difference in difficulty either, I think regex might be a bit easier? So, I think we should go with what's easier for users, unless that proves itself problematic when implementing.

That said, it is worth to point out that if we are constraining the dependencies on the fly, we need to put dependencies in dynamic and move it to the backend config. This is not optimal, but unfortunately PEP 621 does not let us keep the dependencies setting in the project table if we mark it as dynamic.

...

[project]
...
dynamic = [
  'dependencies',
]

[tool.mesonpy]
dependencies = [
  'numpy~=3.2.1',
]

[tool.mesonpy.pin-build-dependencies]
numpy = '*.*.*'

Footnotes

  1. And there has been a push from the community towards this, with PEP 665, though it has been rejected due to it not really targeting sdists. I think the proposal might revived sooner or later, hopefully with also sdists in mind. Anyway, I think this shows us that there are bigger needs here and that it is perfectly reasonable to not concern us too much with this issue from the build system side. For reference, other packaging ecosystems/systems, like Arch Linux will still be able to achieve full package reproducibility, so this really seems like a PyPI / PEP 517 issue.

@FFY00 FFY00 added enhancement New feature or request question Further information is requested labels Jan 24, 2022
@rgommers
Copy link
Contributor Author

The only slight worry I have with this approach, is that the wheels will not be necessarily reproducible. I don't think it's a big issue, as anyone serious about reproducible wheels should be reconstructing the original build environment anyway1, and this won't be in issue there. Still, I feel very slightly uneasy about the built artifacts being highly dependent on the environment ended up being resolved by Pep 517 builders. Just something worth nothing IMO.

Agreed, need to have a reproducible environment for building. I think this topic is something we're going to have to teach people a bit about anyway (I've started some write-ups). It is absolutely untrue that PEP 517 will give you reproducible builds when you have compiled extensions. The illusion that they are reproducible is more harmful than helpful imho.

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.

Hmm, I'm not sure I like messing with version strings, that seems unjustified (and what do you get when you have say five such dependencies?). A project like SciPy will already have a complex version string - from the latest nightly: scipy-1.9.0.dev0+1215.a0e129f-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

My initial proposal would be:

Interesting - yes there's a lot of design flexibility here. I have no clear opinion yet. Note that conda-forge implements this in a reasonable form, looking at its pin_compatible and run_exports may be useful: https://conda-forge.org/docs/maintainer/pinning_deps.html. At first sight, seems easier to understand that regexes (and with more semantic meaning, e.g. ABI compatibility).

That said, it is worth to point out that if we are constraining the dependencies on the fly, we need to put dependencies in dynamic and move it to the backend config. This is not optimal, but unfortunately PEP 621 does not let us keep the dependencies setting in the project table if we mark it as dynamic.

Ah, good point. That may be a problem - aren't other tools supposed to be reading that dependencies table? I'm not sure what the implications would be exactly.

@rgommers
Copy link
Contributor Author

A similar problem, and its solution, can be found at https://www.python.org/dev/peps/pep-0440/#compatible-release.

I looked at this again, and want to point out that ~= is quite a bit simpler than what we need. Using more operators doesn't seem ideal to me, that will be more or less incomprehensible to most devs without looking up the syntax.

Your example pyproject.toml snippet is the wrong way around (it pins runtime dependency and then finds a compatible build dependency). The logic we want instead is to specify the version of the build dependency, and then adjust runtime dependency dynamically (but not with ~= usually, we need to allow a larger range at runtime). I think this will be generically true for packages where ABI matters.

@pradyunsg
Copy link

pradyunsg commented Sep 29, 2022

(this is a drive-by comment, sorry!)

Regarding 4, a couple of ideas...

To go the route of mirroring the existing style from conda-build, where you are "calling" a function.

[tool.mesonpy]
runtime-pins-from-build-dependencies = [
  "compatible(numpy, *.*)",
  "bugfix_only(pythran)",
  "exact(torch)",
]

Alternatively, lean into structured TOML data:

[tool.mesonpy.runtime-pins-from-build-dependencies]
numpy = {strategy="compatible", constraint="*.*"}
torch = "exact"

That is obviously a complex set of dependencies to express in pyproject.toml, and we can't have two sets in there at the same time.

Well, depends on how/when you think those two should be composed. Basically, I'm imagining something like...

[tool.meson]
release-dependencies = [
   "oldest-supported-numpy",
]

(a) If it's OK to have an environment variable that is always set during release... That'll let you have dynamic dependencies via a get_requires_for_build_whatever that reads a list and tacks it on for release builds. When building the sdist, write the additional build dependency in build-system.requires, if the variable is set. (pick your opt-in/opt-out semantics)

(b) It's also reasonable to skip dynamic build dependencies and instead rely upon the build "flow" difference between pip install's consumer-facing behaviour of sources -> wheel vs build's 1 publisher-facing sources -> sdist -> wheel. When building the sdist, write the additional build dependency in thee pyproject.toml build-system.requires always.

This, of course, means that you're assuming that all sdist builds are eventually-public and I reckon that's reasonable.

IIUC, between a strategy for pinning runtime numpy based on the build environment and, a way to ensure oldest-supported-numpy be added as a build dependency for release builds will be sufficient to get the behaviours you want?

FWIW, you could even squish all of this into a single table...

[tool.mesonpy.runtime-pins-from-build-dependencies]
numpy = {strategy="compatible", constraint="*.*", release-requires=["oldest-supported-numpy"]}
torch = "exact"

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.

What you're describing here feels to me like a poor man's substitute for an SBOM, for describing the build environment. There's no need to try and squish that into the local version label IMO.

It might reasonable to just dump the relevant build environment information into a file in the sdist/wheel, and use that... instead of trying to squish things into the local version label. And, if you want the protections that a local version label provides against a PyPI upload, put a sentinel value in there when you generate the non-release variants.

Footnotes

  1. A future pip build's semantics will match build's.

@rgommers
Copy link
Contributor Author

rgommers commented Oct 4, 2022

Thanks for the input @pradyunsg!

Alternatively, lean into structured TOML data:

TIL there's such a thing. I don't have a real preference here - both syntax choices work, and are about equally readable. I'm not quite sure what constraint="*.*" means though - why do we need this? Or in other words, what can that be aside from *.*?

If it's OK to have an environment variable that is always set during release...

I'd prefer to avoid that, because it's going to be one of those things people forget whenever they build outside of the release CI jobs.

(b) It's also reasonable to skip dynamic build dependencies and instead rely upon the build "flow" difference between pip install's consumer-facing behaviour of sources -> wheel vs build's 1 publisher-facing sources -> sdist -> wheel. When building the sdist, write the additional build dependency in thee pyproject.toml build-system.requires always.

I like this option better. A little heretical for a PyPA/Pip dev to suggest it perhaps, but hey I'll take it:)

IIUC, between a strategy for pinning runtime numpy based on the build environment and, a way to ensure oldest-supported-numpy be added as a build dependency for release builds will be sufficient to get the behaviours you want?

Yes, I believe it is. That said, it's an additional improvement over having to make manual changes in release branches, and it's currently of secondary importance compared to pin_compatible. So I'd probably start with implementing just numpy = {strategy="compatible", constraint="*.*"}, and then leave release-requires= for after that.

It might reasonable to just dump the relevant build environment information into a file in the sdist/wheel,

We partly do that anyway, for example the detected BLAS/LAPACK version gets written to a generated .py file and exposed in numpy and scipy through a show_config() function. Extending that to dump the full configure output and environment information is quite useful - I was already planning to do something like that.

@pradyunsg
Copy link

pradyunsg commented Oct 4, 2022

I guess I used constraint instead of min_pin out of being uninformed. The latter is a better name anyway.

https://docs.conda.io/projects/conda-build/en/latest/resources/variants.html?highlight=pin_compatible#extra-jinja2-functions

@rgommers
Copy link
Contributor Author

rgommers commented Oct 5, 2022

I guess I used constraint instead of min_pin out of being uninformed. The latter is a better name anyway.

Ah, this makes sense now, thanks. And it's another choice to make. We don't just need min_pin but also upper_bound (e.g., numpy should be pinned to < 1.most_recent_minor_version + 3). This could all be done in a single expression:

numpy = {strategy="compatible", min_pin=x.x.x, upper_bound=1.27.0}  # expressing it as `max_pin=x.x+3` would be great, conda-build doesn't allow this 

or it could be done separately like I originally suggested:

[tool.mesonpy.runtime-pins-from-build-dependencies]
# note that we don't need `min_pin`, because >=1.19.5 is already specified in the build deps
numpy>='pin_compatible,<1.27.0' 

I think in the end all of these are roughly equivalent, it's mostly a matter of preferred syntax (what's easiest to read & parse). I suspect structured TOML is best from a parsing perspective.

@FFY00
Copy link
Member

FFY00 commented Feb 22, 2023

Okay, after spending quite some time with it, I think a viable option would be to follow the normal requirement specification format, but using the @ as a placeholder for elements we want to match. Additionally, we would also allow a TOML table with alternative matchers, which would provide a shortcut for common annoying matches, and allow use to implement an escape hatch for users if that is requested.

[tool.mesonpy.binary-runtime-constrains]
numpy = '>=@.@.@'  # for 1.20.3, it would add >=1.20.3 to the requirements
pythran = '~=@.@.@'  # also equivalent to ">=@.@.@,==@.@.*", for 0.10.0 it would add >=0.10.0,==0.10.*
torch = { strategy='exact' }  # this would match the exact same version

Can y'all try to break this approach?

@dnicolodi
Copy link
Member

I've tried to understand what exactly is the problem we are trying to solve, I've read all the comments, but I'm not completely sure of the problem formulation. A way to derive the runtime dependencies from the version of the dependencies at the time of compilation is required. However, I'm not sure I understand all the details.

Let's assume we have a package A that exports some C API. Package A follows some rules for API and ABI compatibility. Let's say that minor versions (x.y.*) are ABI compatible and minor version (x.*) are API compatible. Let's assume that a package B links against the C API of package A. Package B needs to target a specific API of package A, which is defined by the major the major version numbers X.*. This can be expressed in the pyproject.toml like this:

[build-system]
requires = [
   'A ~= X'
]

If at the moment of compilation package A is available in version X.Y.Z, the resulting binaries of package B are compatible only with versions X.Y.* thus we would like to have the equivalent of

[project]
dependencies = [
   'A ~= X.Y'
]

encoded in the wheels. This seems reasonable. However, what if we want to release a version of B compatible with A version X.Y+1 ? We need to recompile it using this version of A. However, because the dependencies versions are not recorded in the wheel tags, the wheels for B compatible with A versions X.Y and X.Y+1 are identical.

The only way I see to solve this is to have specific version of B compatible with specific ABI compatible versions of A. This requires having the same version specification in build-system.requires and in project.dependencies. What am I missing?

@FFY00
Copy link
Member

FFY00 commented Feb 22, 2023

This can be expressed in the pyproject.toml like this:

[build-system]
requires = [
   'A ~= X'
]

No, it'd have to be A~=X.Y, which would turn into A>=X.Y,==X.*. In this example, you probably just want A==X.*.

If at the moment of compilation package A is available in version X.Y.Z, the resulting binaries of package B are compatible only with versions X.Y.* thus we would like to have the equivalent of

[project]
dependencies = [
   'A ~= X.Y'
]

Again, it'd need to be A~=X.Y.Z, which would translate to A>=X.Y.Z,==X.Y.*, or you can just do A==X.Y.* directly, depending on what you actually want.

This seems reasonable. However, what if we want to release a version of B compatible with A version X.Y+1 ?

Yes, you'd need to make sure you build in an environment with A==X.Y+1.

We need to recompile it using this version of A. However, because the dependencies versions are not recorded in the wheel tags, the wheels for B compatible with A versions X.Y and X.Y+1 are identical.

The wheel names are identical if we don't add an extra identifier, but the wheels themselves would be different, and contain different metadata. Installers will refuse to install a A==X.Y.* in an environment with A=X.Y+1.

@dnicolodi
Copy link
Member

dnicolodi commented Feb 22, 2023

No, it'd have to be A~=X.Y, which would turn into A>=X.Y,==X.*. In this example, you probably just want A==X.*.

Yes, thank you. The syntax of the ~= operator always confuses me.

The wheel names are identical if we don't add an extra identifier, but the wheels themselves would be different, and contain different metadata. Installers will refuse to install a A==X.Y.* in an environment with A=X.Y+1.

Sure, but what do you do with a wheel with the same file name bud different metadata? You cannot upload it to PyPI and I think any other wheel archive would reasonably do not allow to replace an existing version with a different binary. If you cannot redistribute the wheel, is this relevant only for local installations? Namely we need to make sure that when A version X.Y+1 is installed a compatible version of B is also installed.

@FFY00
Copy link
Member

FFY00 commented Feb 22, 2023

You can add a local version identifier to differentiate it, that will give you a different wheel name, allowing you to upload it. Supporting uploading different wheels with the same name is also something PyPI could potentially support in the future.

PyPI does not support it yet, but PEP 658 provides an API for installers to look directly at the metadata before downloading the wheel. Right now, installers will just download wheels until they find a compatible one.

@rgommers
Copy link
Contributor Author

Can y'all try to break this approach?

Quick first question: what happens with version specifiers like .dev0 and rc1 when you do something like '>=@.@.@'?

@FFY00
Copy link
Member

FFY00 commented Feb 22, 2023

Quick first question: what happens with version specifiers like .dev0 and rc1 when you do something like '>=@.@.@'?

Nothing. >=@.@.@ for 1.2.3.4.5 translates to >=1.2.3, if you want to match those, you can, eg. >=@.@.@.dev@. I don't think you'd use this often, but it's there if you need.

I haven't decided what to do when you try to match one of these modifiers and one is not present, but my initial instinct would be to error out.

@dnicolodi
Copy link
Member

Distributing wheel with the same version but different content is IMHO a very bad idea. You can add an identifier to the file name, but not the version stored in the wheel metadata. It seems an even worst idea to me. However, supporting locally build wheels is necessary.

What kind of operations on the version identifiers do we need to support? Which formats of version identifiers do we need to support?

I think we need to support all version identifier formats allowed by PEP 440 and we need to be able to replace the build version into an arbitrary string, with the possibility to drop version identifiers components at the end of the identifiers and replace them with *. These operations should be enough to construct arbitrary version requirements. Finding a convenient syntax to express these operations although seems complicated.

@dnicolodi
Copy link
Member

Nothing. >=@.@.@ for 1.2.3.4.5 translates to >=1.2.3, if you want to match those, you can, eg. >=@.@.@.dev@. I don't think you'd use this often, but it's there if you need.

The problem wit this is that if you have version 1.2.3.dev0 installed and this translates into a runtime requirement >=1.2.3 you cannot install the obtained wheel, because 1.2.3.dev0 < 1.2.3. Isn't it?

@FFY00
Copy link
Member

FFY00 commented Feb 22, 2023

Distributing wheel with the same version but different content is IMHO a very bad idea.

I disagree, there's a point you simply cannot provide all the compatibility details in the name. This is already true btw with pyhton-requires for eg.

You can add an identifier to the file name, but not the version stored in the wheel metadata.

You can add an identifier to the version, the local identifier field (eg. something in 1.2.3+something) exists for this exact purpose 😅

What kind of operations on the version identifiers do we need to support? Which formats of version identifiers do we need to support?

I don't know. Something that's good enough for most people, and provide an escape hatch for the ones it isn't.

The problem wit this is that if you have version 1.2.3.dev0 installed and this translates into a runtime requirement >=1.2.3 you cannot install the obtained wheel, because 1.2.3.dev0 < 1.2.3. Isn't it?

Yes, and I think that's the main issue with this approach, but I think we'll have issues with these kinds of matching semantics with other approaches too.

There are a couple things we can do to fix this, I am struggling to decide what I think would be better, but perhaps we should just backfill the pre, post, and dev release fields based on the operator?

@dnicolodi
Copy link
Member

dnicolodi commented Feb 22, 2023

You can add an identifier to the version, the local identifier field (eg. something in 1.2.3+something) exists for this exact purpose 😅

Sure, but if you are touching it up outside the build tool, then you can in the same way adjust the dependencies manually and we don't need to be providing infrastructure for this. And if you are updating the pyproject.toml to add the local identifier in the version field, you can again just update the dependencies manually.

Yes, and I think that's the main issue with this approach, but I think we'll have issues with these kinds of matching semantics with other approaches too.

The solution is to provide a bit more complex language. A first attempt could be something based on the Python string formatting language using the fields of the packaging.version.Version object:

[tool.meson-python.dynamic]
dependencies = [
  'numpy >= {v}',
  'foobar ~= {v.major}.{v.minor}',
  'insane == {v.major}.*.{v.micro}',
]

This is also easier to implement.

@rgommers
Copy link
Contributor Author

Additionally, we would also allow a TOML table with alternative matchers, which would provide a shortcut for common annoying matches, and allow use to implement an escape hatch for users if that is requested.

You only used 'exact' as an example of a named option, so to make sure - did you think about including pin-compatible and bugfix-only? With your idea of it being split between @-style version specifiers and a table, what exactly would you do for my example from higher up:

numpy = {strategy="compatible", min_pin=x.x.x, upper_bound=1.27.0} 

@dnicolodi
Copy link
Member

Let's look now at pythran, where versions seemingly have a three part release section, but we only want to match the first two. The same problem from the previous example arises, but using {v} here is not an option.

>>> match('0.10.0', '>={v.major}.{v.minor}')
Does `0.10.0` match `>=0.10` (generated from `>={v.major}.{v.minor}`)? True
>>> match('0.10.0rc1', '>={v.major}.{v.minor}')
Does `0.10.0rc1` match `>=0.10` (generated from `>={v.major}.{v.minor}`)? False

I understand the issue. I don't understand what the proper solution would be: independent of the way in which you arrive at it, how would the requirement string need to look like in this case?

@FFY00
Copy link
Member

FFY00 commented Mar 6, 2023

Well, the main issue with that now is that if the upstream releases a new version with a different number of release parts, our constraint significantly changes its meaning.

>>> def match(version, template, check):
...     v, c = Version(version), Version(check)
...     s = Specifier(template.format(v=v))
...     print(f'Does `{c}` match `{s}` (generated from `{template}`)? {c in s}')
...
>>> match('0.10.0', '~={v}', '0.10.1')
Does `0.10.1` match `~=0.10.0` (generated from `~={v}`)? True
>>> match('0.10.0.1', '~={v}', '0.10.1')
Does `0.10.1` match `~=0.10.0.1` (generated from `~={v}`)? False

Remember that ~=0.10.0 expands to >=0.10.0,==0.10.*, and ~=0.10.0.1 expands to >=0.10.0.1,==0.10.0.*.

@FFY00
Copy link
Member

FFY00 commented Mar 6, 2023

I understand the issue. I don't understand what the proper solution would be: independent of the way in which you arrive at it, how would the requirement string need to look like in this case?

As we proposed, if the dependency is on a pre or dev version, we simply do not generate any constraint. My initial proposal was pinning it, but Ralf was worried because that would break development workflows, so we decided to just not generate any constraint. This is the behavior right now with any build backend anyway, so it shouldn't be a big issue.

@dnicolodi
Copy link
Member

Sure, but that's what the user asked for. How would you fix that using any other form of templating? You cannot know what the user really meant. I understand there are limitation, but I don't see how you lift these limitations with a different form of templating. The only thing you can do is adapt the expansion of the templating to work is some special cases, but it would necessarily break in others.

@FFY00
Copy link
Member

FFY00 commented Mar 6, 2023

See the reply I just posted.

@rgommers
Copy link
Contributor Author

rgommers commented Mar 6, 2023

What's wrong with ~= {v} when you have dev or pre releases?

That doesn't say what the ~= should apply to (e.g., the second or third version component.

Also, you can turn that question around: any project may have dev or pre versions, now or in the future. How can you then use anything beyond {v}?

@dnicolodi
Copy link
Member

That doesn't say what the ~= should apply to (e.g., the second or third version component.

It applied to the last component.

Also, you can turn that question around: any project may have dev or pre versions, now or in the future. How can you then use anything beyond {v}?

Use it to match what? ~= does not work with dev or pre releases, but this is hardly an issue with the templating solution I'm proposing. Please bring it up with the PEP 440 authors.

@FFY00
Copy link
Member

FFY00 commented Mar 6, 2023

Use it to match what? ~= does not work with dev or pre releases, but this is hardly an issue with the templating solution I'm proposing. Please bring it up with the PEP 440 authors.

This is by design, dev releases aren't meant to be compared, and pre releases are only supposed to match when they're directly compared against other pre release, otherwise 0.10.1rc1 would match >=0.10 (some resolvers provide optionally enabling this as extra functionality).

IMO we should cut our losses and just don't generate constraints for dev or pre releases (we would output a message mentioning this to the user though).

@rgommers
Copy link
Contributor Author

rgommers commented Mar 6, 2023

What "same version" means depends on the ABI stability promise of individual projects. I can think of only two projects for which this mechanism is going to be used: NumPy and SciPy. What are the policies for these? Can we have concrete and precise examples of the pins these projects would like to put in place?

See https://numpy.org/devdocs/dev/depending_on_numpy.html#adding-a-dependency-on-numpy for NumPy. SciPy does have a C API, mainly cython_blas, cython_lapack and cython_special, but there's no specific policy beyond "don't break backwards compatibility.

There's a lot more projects to which this applies I'm sure - pretty much anything with a C or C++ API I'd think:

  • PyTorch is an example of a project that needs exact pinning (even bug-fix releases aren't guaranteed to be stable, although they mostly are in practice).
  • Any project with a strategy like CPython is bugfix-only (of course we don't need it for CPython itself, because that is implicitly special-cased already).
  • Scikit-learn and other projects that ship .pxd files for Cython support also expose a Cython (and hence C) API.
  • Any native libraries already shipped on PyPI - things like Intel MKL. There's a growing interest in shipping non-Python libraries on PyPI; BLAS, LAPACK, OpenMP, Arrow, etc. Recently the RobotPy author announced he is doing this.
  • Dynamic linking to CUDA a la https://github.com/NVIDIA/cuQuantum/tree/main/extra/demo_build_with_wheels.

I'm sure there's more.

@dnicolodi
Copy link
Member

As we proposed, if the dependency is on a pre or dev version, we simply do not generate any constraint

I think this is probably the only sane thing to do, given the constraints. I thought you had an issue with the templating form suggested, not with generating or not requirements. A possible addition to consider is the use of markers to constrain the generation of the pins on some conditions. For example:

build-time-pins = [
  'pythran ~= {v}; is_stablerelease', # or some alternative spelling of this
]

packaging.version.Version has is_devrelease, is_postrelease, is_prerelease attributes and a

@property
def is_stablerelease(self):
    return not self.is_devrelease and not self.is_prerelease

could be added.

@rgommers
Copy link
Contributor Author

rgommers commented Mar 6, 2023

A possible addition to consider is the use of markers to constrain the generation of the pins on some conditions.

I think it should be invisible/automatic rather than done via markers. This is pretty much always needed, and it's too easy to forget.

@dnicolodi
Copy link
Member

Having the requirement emitted would make sense if the requirement is == {v}.

@dnicolodi
Copy link
Member

dnicolodi commented Mar 6, 2023

Use it to match what? ~= does not work with dev or pre releases, but this is hardly an issue with the templating solution I'm proposing. Please bring it up with the PEP 440 authors.

This is by design, dev releases aren't meant to be compared, and pre releases are only supposed to match when they're directly compared against other pre release, otherwise 0.10.1rc1 would match >=0.10 (some resolvers provide optionally enabling this as extra functionality).

For which resolvers 0.10.1rc1 >= 0.10 is false by default? In my experience, pip resolves that requirement as true, by default. I think all other resolvers around are designed to be compatible with pip. Also I would expect that to be true, otherwise how can you test release candidates? I mean, we even used that in the meson-python CI when we testes against Meson 1.0rc1 and we didn't have to do anything special to enable it (other than pip install meson==1.0rc1).

I had a look in the pip source code. AFAICT, pre-releases are always allowed to match in pip.

@FFY00
Copy link
Member

FFY00 commented Mar 6, 2023

Having the requirement emitted would make sense if the requirement is == {v}.

Then so it would if it was >= {v}, but this breaks the current workflow. I'll let Ralf expand on that if necessary.


For which resolvers 0.10.1rc1 >= 0.10 is false by default? In my experience, pip resolves that requirement as true, by default. I think all other resolvers around are designed to be compatible with pip. Also I would expect that to be true, otherwise how can you test release candidates? I mean, we even used that in the meson-python CI when we testes against Meson 1.0rc1 and we didn't have to do anything special to enable it (other than pip install meson==1.0rc1).

I had a look in the pip source code. AFAICT, pre-releases are always allowed to match in pip.

All of them, this is the default behavior mandated by the PEP.

From https://peps.python.org/pep-0440/#handling-of-pre-releases

By default, dependency resolution tools SHOULD:

  • accept already installed pre-releases for all version specifiers
  • accept remotely available pre-releases for version specifiers where there is no final or post release that satisfies the version specifier
  • exclude all other pre-releases from consideration

But I realize maybe I wasn't very clear, pre-releases are accepted if they are already installed, but by default they are not allowed when installing.

Doing anything other than pinning isn't very meaningful anyway, as likely you won't want the same constraint for pre releases anyway. I don't think this is a big deal as people should not be publishing wheels built against pre releases.


I thought you had an issue with the templating form suggested, not with generating or not requirements.

Well, both, kinda 😅. They are intertwined, the constraint generation of dev and pre releases was definitely the main point, but I also have some thoughts about your syntax which I did not go into to avoid bikesheding as it is tied to the first point.

From my reply above:

Considering only the design, I think there are two main key points regarding your proposal, 1) what happens on dev and pre-release versions, and 2) the syntax. I think 2) is dependent on 1), so let's just keep things simple and just look at 1).


I think we are now all in agreement that skipping constraints on dev and pre releases is at least a reasonable way to fix the issues regarding the constraints against dev and pre releases.

Adopting that into our design, it means we now only need to match the release section of the version, which can also simplify the syntax required.

I don't want to come off harsh here, but considering that, I think the @.@.@ syntax is more suitable here for the following reasons:

  • It's very intuitive, for both reading and writing
    • It does not require you to have knowledge of any API to use
  • It does not require the usage of str.format

I am happy to discuss alternative syntax, my main point here really is that we can avoid str.format.

@FFY00
Copy link
Member

FFY00 commented Mar 10, 2023

@henryiii, I know this is a bit1 involved, so it's totally fine if you don't want or can't get into it right now, but this is something we would maybe like to turn into a PEP later, so it'd be good to hear your feedback as a scikit-build maintainer.

Having to move the dependencies field to a tool specific field sucks, so a PEP standardizing could help with that. IMO it could also make sense to relax PEP 621's requirement, to allow this case.

Footnotes

  1. ...that's an understatement

@FFY00
Copy link
Member

FFY00 commented Mar 11, 2023

I don't want to come off harsh here, but considering that, I think the @.@.@ syntax is more suitable here for the following reasons:

I would like to add something here:

  • It prevents the usage of literals (as in, the user cannot do numpy == 1.24.2, which is something that should go into the normal dependency field)
    • This is something we can try to detect with the {v} approach, but it will add more complexity

Also, I opened https://discuss.python.org/t/relaxing-or-clarifying-pep-621-requirements-regarding-dynamic-dependencies/24752 regarding allowing us to keep project.dependencies.

@rgommers
Copy link
Contributor Author

Also, I opened https://discuss.python.org/t/relaxing-or-clarifying-pep-621-requirements-regarding-dynamic-dependencies/24752 regarding allowing us to keep project.dependencies.

Thanks! I commented there. This will be important to settle. The PEPs here are very unhelpful. What I had in mind was simply taking the existing requires = and dependencies = table, leaving those unchanged and then adding an extra constraint on top. And having that constraint for in a tool-specific section, and then standardizing it later. Having to move everything to the tool-specific section is super annoying, and will break a bunch of other stuff like tools that introspect data (https://libraries.io/, GitHub dependency graph, etc.) or map dependencies to those of other packaging systems (pypi2rpm, Grayskull, etc.).

We could still consider ignoring the PEP's assumption/requirement if we can't think of any practical downside of doing so. This is already being done today by any ad-hoc approach to applying these constraints, and I'm not aware of problems with that.

@rgommers
Copy link
Contributor Author

rgommers commented Mar 22, 2023

I've tested @dnicolodi's implementation in gh-319 with SciPy. It worked smoothly and built a wheel with the below diff on SciPy's pyproject.toml:

diff --git a/pyproject.toml b/pyproject.toml
index 8f776b7d3..d828076cb 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -63,11 +63,6 @@ maintainers = [
 # release branches, see:
 #     https://scipy.github.io/devdocs/dev/core-dev/index.html#version-ranges-for-numpy-and-other-dependencies
 requires-python = ">=3.9"
-dependencies = [
-    # TODO: update to "pin-compatible" once possible, see
-    # https://github.com/FFY00/meson-python/issues/29
-    "numpy>=1.21.6",
-]
 readme = "README.rst"
 classifiers = [
     "Development Status :: 5 - Production/Stable",
@@ -88,7 +83,7 @@ classifiers = [
     "Operating System :: Unix",
     "Operating System :: MacOS",
 ]
-dynamic = ['version']
+dynamic = ['version', 'dependencies']
 
 [project.optional-dependencies]
 test = [
@@ -131,6 +126,15 @@ source = "https://github.com/scipy/scipy"
 download = "https://github.com/scipy/scipy/releases"
 tracker = "https://github.com/scipy/scipy/issues"
 
+[tool.meson-python]
+# base dependencies, used for the sdist
+dependencies = ["numpy>=1.21.6"]
+
+# additional requirements based on the versions of the dependencies
+# used during the build of the wheels, used for the wheels
+build-time-pins = ['numpy >= {v.major}.{v.minor}']
+

Unfortunately I do have a semantics question to raise, so let's deal with that first before discussing syntax first. The metadata generated is:

Requires-Dist: numpy>=1.21.6,>=1.24

Which makes sense given the implementation .... but isn't quite what we were aiming for I suspect (right?). Ideally we want to replace >=1.21.6 with the tighter constraint, rather than appending to it. I'm not quite sure that the current output is valid and will not cause other packaging tools to choke on that. Or maybe it's fine.

I can also see an upside though - it makes quite explicit what is happening there, and there won't be issues if the installed version is too old (EDIT: numpy>=1.21.6,>=1.20.0 would work as expected - namely error out - while numpy>=1.20.0 would not). The PEP 508 grammar may guarantee that this works.

Thoughts?

@dnicolodi
Copy link
Member

The metadata generated is:

Requires-Dist: numpy>=1.21.6,>=1.24

Which makes sense given the implementation .... but isn't quite what we were aiming for I suspect (right?). Ideally we want to replace >=1.21.6 with the tighter constraint, rather than appending to it. I'm not quite sure that the current output is valid and will not cause other packaging tools to choke on that. Or maybe it's fine.

PEP 508 is maybe not the most informative. PEP 440 is a bit nicer to read. Anyhow, my reading of these PEPs is that the comma separating expressions in version specifiers is to be read like an "AND", see https://peps.python.org/pep-0440/#version-specifiers Therefore, things like >=1.21.6,>=1.24 are valid.

The reason why I decided to implement things like this is simplicity: the static dependency specifiers can be arbitrarily complex and replacing only one part of them is not trivial. It would require doing some algebra with version numbers, and I would happily avoid that, unless strictly required.

I cannot guarantee that there is no tool out there with a bug that prevent it from interpreting these requirements in the right way, but it would be indeed a bug, and I expect it to be very unlikely.

numpy>=1.21.6,>=1.20.0 would work as expected - namely error out - while numpy>=1.20.0 would not

I'm not sure I understand. Why would the first form error out? A 1.21.6 version would satisfy both inequalities. I don't see why there should be a preferred ordering for the inequalities. Is there a typo somewhere?

@rgommers
Copy link
Contributor Author

The reason why I decided to implement things like this is simplicity:

Makes sense - I agree with the choice. Any potential surprise can be addressed with docs.

I'm not sure I understand. Why would the first form error out? A 1.21.6 version would satisfy both inequalities. I don't see why there should be a preferred ordering for the inequalities.

No there's no typo or ordering. What I meant there was that the sdist req is >=1.21.6 and then at build time 1.20.0 is detected. If the logic would be "replace existing >= constraint with new >=build_time_version constraint, there'd be a silent problem. In other words, yes the algebra with version numbers would be complex:)

@rgommers
Copy link
Contributor Author

So okay, we're all good semantics-wise then I think. Before I try to go through this whole thread again regarding syntax or test/review in more detail, let me check my understanding of the state of things:

  • The discussions on semantics have converged, and @dnicolodi and @FFY00 you're essentially on the same page there?
  • The key thing here is syntax: {v.major}.{v.minor} vs. @.@ (and variations of these things). Correct?

@dnicolodi
Copy link
Member

I don't want to come off harsh here, but considering that, I think the @.@.@ syntax is more suitable here for the following reasons:

* It's very intuitive, for both reading and writing

I don't find it intuitive. Actually, I still haven't understood how you say "the complete version identifier of the installed package" with this syntax: it cannot be @ as this would be the major number only, and it cannot be @.@.@ as this would not work with packages that have more or less than three version identifiers components.

  * It does not require you to have knowledge of any API to use

API? The only thing the user needs to know is that there is an object with some fields. Also the @.@.@ syntax needs to be documented and learnt. I don't see how reusing the concepts from another well documented package and string formatting, which is familiar to all Python users, is worst that coming up with a whole new dedicated syntax.

* It does not require the usage of `str.format`
  
  * Does not leak object attributes (eg. `{x.imag}` where `x` is an `int`)
  * Does not allow the [Python format string syntax](https://docs.python.org/3/library/string.html#formatstrings) (eg. `{x!r}`)

There are infinite ways for users to screw up metadata syntax, and even more in which version requirements can be wrong. I don't think these are relevant concerns. Especially considering that we are discussing an advanced feature and not something used by inexperienced programmers.

  * minor (not really an issue right now, but good to avoid if we can): Avoids unnecessary thread-unsafe code (see the note in https://docs.python.org/3/library/stdtypes.html#str.format)

The problem with thread safety is limited to the n format specifier and to some platforms: just don't use it. And f-strings have exactly the same issue when using the n format specifier, there aren't two different implementations of number formatting routines for str.format() and f-strings.

* It prevents the usage of literals (as in, the user cannot do `numpy == 1.24.2`, which is something that should go into the normal dependency field)

Previously in the discussion you suggested to allow the users to provide via an entry point a function to call to generate the build-time version requirements. Now you are concerned that the users may provide a nonsensical string. How do the two things go together? I think that an user provided function has many more ways to do the wrong thing that the very simple substitution function I'm proposing. Also, how is it more difficult to avoid the user to use nonsensical strings using string formatting to substitute value than doing the same for the proposed @.@ placeholder based substitutions?

Your objections to the string formatting approach are essentially that the user may use it incorrectly. I don't think that this should be the main concern. Eliminating this concern requires writing a non trivial parser for the @.@ placeholder based substitutions. I don't think that the effort required the code to make that work is worth the IMO very small advantages.

I find the @.@-based syntax the contrary of intuitive.

If you don't want to use simple string formatting, we could consider the use template strings, https://docs.python.org/3/library/string.html#template-strings

@rgommers
Copy link
Contributor Author

@dnicolodi @FFY00 you both presented your arguments, and I think the differences are fairly minor compares to the similaries and the benefits of getting this merged. You clearly each have your preferences, but in the end we have to go with a single syntax. How about we let other maintainers choose their preferred syntax? I can spend a bit of time summarizing.

And I also think @henryiii's opinion is important if scikit-build-core is going to support this feature (and I hope it will). It'd be great if we ended up with happy users of the same syntax in two build backends, that will make future standardization way easier.

@pradyunsg
Copy link

Is this still something being worked on? Trying to figure out what stuff I can trim from my "keep an eye out for this" list. :)

@dnicolodi
Copy link
Member

@pradyunsg There is a prototype in #319 but I have the feeling there isn't yet consensus on how the syntax for this feature should look like.

@rgommers
Copy link
Contributor Author

I'm still interested in this (but, long TODO list and this isn't quite near the top). Also, the PEP that @henryiii posted a draft version for on Discourse which included "dependency narrowing" would be useful to have, because that would make use of this feature quite a bit easier (no need to move anything to a meson-python tool-specific version in pyproject.toml).

@tobiasdiez
Copy link

Since it seems hard to come up with a syntax that covers all edge cases, what about the following very general approach inspired by setuptool's dynamic metadata?

From a user's perspective, it would look something like:

[project]
dynamic = [
  "dependencies",
  "version"
]

[tool.meson-python.dynamic]
dependencies = { attr = "myapp.helper.dependencies" }
version = { attr = "myapp.helper.version"}

and then in myapp.helper.dependencies you can implement whatever dynamic behavior you would like. This would, for example, also make it possible to load dependencies from other files or have them specific to certain ci variables etc.

@dnicolodi
Copy link
Member

@tobiasdiez Please note that the attr directive allowed by setuptools does not execute Python code, it parses the module code into an abstract syntax tree and extracts constant values. This does not solve the problem that is being addressed here: derive the installation dependencies from the dependencies present at build time. Solving this problem requires two things: a way to retrieve the versions of the dependencies present at build time, and a way to execute some logic. The attr directive in setuptools does not address either of these requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants