Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boost stdlib development #83

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

proux01
Copy link

@proux01 proux01 commented Feb 6, 2024

@andres-erbsen
Copy link

andres-erbsen commented Feb 6, 2024

Could you please say more about the advantages of the proposed system over the current one -- why should we do what you say under "Motivation"? We clearly have different intuitions here, and to make a good decision we'll need to understand why we each want what we want. I have opined here.

@ejgallego
Copy link
Member

Thanks for the CEP @proux01 , a few comments:

  • I think that the CEP covers two orthogonal set of tasks, namely:
  • I think a critical aspect to this CEP is integration with tooling (build systems, package managers, doc, search tools, etc...) I think we have pretty good evidence that such a granular split of the stdlib has the potential to impact users pretty negatively unless tooling can compensate the added difficulty.

IMHO I would thus do the removal of the stdlib status right now (tho the plan is missing some technical details), and I'd try to study / develop more the usability aspects of the stdlib split.

@proux01
Copy link
Author

proux01 commented Feb 15, 2024

* I think that the CEP covers two orthogonal set of tasks, namely:
  
  * splitting the stdlib into several packages
  * removing the stdlib's special status (c.f. [[build] Turn the Standard Library into a regular library coq#16091](https://github.com/coq/coq/issues/16091) )
    IMHO it would make sense to decouple these two tasks, as IMHO they do seem to have very different granularity.

I agree those are two quite diferent things. But they are not fully orthogonal either. For instance, doing the second thing implies separating the prelude (+ probably Setoid and the like) which is already a package split. So doing both things at the same time may actually save some effort.

* I think a critical aspect to this CEP is integration with tooling (build systems, package managers, doc, search tools, etc...) I think we have pretty good evidence that such a granular split of the stdlib has the potential to impact users pretty negatively unless tooling can compensate the added difficulty.

Could you please elaborate? We already have working package managers (Opam and Nix). My experience with mathcomp is that keeping both a make-all target and separate packages allows both comfort for developers (who can just type make to build everything withut worrying about the package split) and users (who can easily use only part of the, large, library). Tests in CI ensure that we don't break any of the two builds accidentally. I guess this setting is working well enough for mathcomp, as I can't remember any complaint about it. Do you see any fundamental difference between the stdlib and mathcomp that would turn this wrong? (at least my small prototype in the CEP convinces me that the mathcomp setting is replicable for the stdlib)

IMHO I would thus do the removal of the stdlib status right now (tho the plan is missing some technical details),

Please elaborate on those "missing technical details", that's the whole point of CEPs

and I'd try to study / develop more the usability aspects of the stdlib split.

What are precisely the points you think deserve further study / development?

@silene
Copy link

silene commented Feb 15, 2024

A point that troubles me is that the files PArray, Sint63, Uint63, FloatAxioms are moved outside of Coq. In my opinion, files that describe the computational behavior of Coq's kernel should be tightly coupled to Coq's kernel. We cannot have an independent user library be the authoritative source of the reduction rules of Coq's builtin symbols.

I am not saying that these files should be kept in Coq in their entirety. But all the Primitive and Axiom declarations they contain should stay. The downside is that part of ZArith has to be kept in Coq too, so that the axioms can be expressed. But it would just be the Definition declarations of ZArith; no lemmas are needed. (This is not that different from what you did with FloatAxioms; only some definitions were moved from Flocq to Coq; all the theorems, especially those about real numbers, stayed in Flocq.)

@andres-erbsen
Copy link

@proux01 would you be willing to engage with me on this topic, in a medium of your choice? I see the value of some aspects of this CEP (e.g., cleaning up the dependencies within stdlib and spinning out obscure functionality) but leaving users overall at the mercy of an ecosystem of small independent packages evokes the horrors of npm & leftpad in my imagination. I figure that's not what you want either, but it's really hard for me to see what considerations would outweigh the costs of not having one standard definition of the integers and similarly basic concepts. 🙏 Thanks!

@proux01
Copy link
Author

proux01 commented Feb 19, 2024

A point that troubles me is that the files PArray, Sint63, Uint63, FloatAxioms are moved outside of Coq. In my opinion, files that describe the computational behavior of Coq's kernel should be tightly coupled to Coq's kernel. We cannot have an independent user library be the authoritative source of the reduction rules of Coq's builtin symbols.

Good point

I am not saying that these files should be kept in Coq in their entirety. But all the Primitive and Axiom declarations they contain should stay. The downside is that part of ZArith has to be kept in Coq too, so that the axioms can be expressed. But it would just be the Definition declarations of ZArith; no lemmas are needed.

Indeed, I updated the CEP / prototype accordingly. This adds 15 files / 2.1 kloc (almost half of them for the ZArith definitions).

@proux01
Copy link
Author

proux01 commented Feb 20, 2024

would you be willing to engage with me on this topic, in a medium of your choice?

@andres-erbsen we can have a lengthier discussion on Zulip for instance

@proux01
Copy link
Author

proux01 commented Feb 20, 2024

This CEP should be discussed during a forthcoming Coq Call: https://github.com/coq/coq/wiki/Coq-Call-2024-03-05

@andres-erbsen
Copy link

andres-erbsen commented Feb 27, 2024

Could we have this coq-call discussion the week after, on the 12th? I'm unsure whether I'll be able to make it on the 5th, but I would very much like to participate (and ideally get in some one-on-one iteration with @proux01 first as well).

@andres-erbsen
Copy link

andres-erbsen commented Mar 1, 2024

Had a productive discussion with @proux01; notes below.

Between the two of us, there seems to be agreement that increased clarity of dependency structure within stdlib would be valuable (this is Emilio's first point above, let's call it the "build split"). In particular, there is a common experience where contributing to stdlib is stifled because it is unclear what other files each particular file can import. @proux01 has already done the technical work required to make this happen, and while the dependency tree is gnarly, it is better than nothing, and capturing it in code would be an important improvement to stdlib maintenance. There also seems to be low-hanging fruit in terms of improving the developer experience further, for example by providing build targets that only build subsets of stdlib for intermediate feedback during prototyping. And I think it is independently valuable to identify "orphan" parts of stdlib that are not depended on internally, such as Sets and Reals, so they can be considered for spin-out.

As for removing stdlib from Coq (let's call it the "distribution split"), @proux01 is much more optimistic about the immediate user-facing impact than I am. This disagreement reflects different attitudes towards third-party packaging of coq and stdlib, and whether we consider the experience of users who rely on these packages to be in part a responsibility of coq developers. Specifically, I argue that if we are going to make a drastic change in packaging (going from 1 package to 20+), this should be coordinated closely with the maintainers of downstream packages; making this change without their approval is asking for trouble for all users further downstream. Similar issues to what I expect already happened with the build-system changes a few years ago; I am just trying to urge us to do better now. I personally rely exclusively on linux-distribution packages for Coq when I am not building from git, and I would guess almost all people I work with do the same; I do read on the community survey that opam is twice as popular but it is unambiguously not the only game in town. Thus I would like to see a clear, package-maintainer-approved plan for the distribution split, with a focus on minimizing disruption to users who want the entire stdlib. I personally consider the possible complications from the distribution split not to be worth the likely benefits, but if it is to happen, I am willing to help out to make sure it goes smoothly. As a contributor to Arch and Alpine Coq packages, I would prefer if the many-way split was preceded by a smaller-scale canary (perhaps just spinning out Reals).

One technical point that came up is the relationship between plugins and theory they support: on one hand, plugins are tightly coupled with Coq due to the use of the ML API, so it would make sense to keep them with Coq. @proux01 expects changes to plugin-gallina interfaces to be much less common, so putting e.g. lia and Z into separate repositories would work fine. I am less confident of this based on work I've followed, but perhaps @coq/micromega-maintainers can offer us an opinion from the close up?

As for non-technical aspects of specifically removing the special status of stdlib, I agree with this goal only for the less-frequently-used parts of the stdlib. I think it is in the interest of the community for very commonly used theory such as list, Z, bool, wf, ring, lia, and etc to continue to be distinguished and maintained centrally, by a subteam of coq developers. (The community survey agrees that these are the most popular parts).

Both of us also agree that there is long-term value in curating a cohesive, non-duplicative core of coq libraries and that this CEP isn't really about solving that. This is a much more ambitious and distant goal than figuring out how the current stdlib theories should be maintained. We agree that attracting more contributors would be valuable, and that we don't really have a promising plan for how to do that. Perhaps I should do a CEP on the more general topic of library-ecosystem stewardship; hopefully it can be relatively independent from this one though.

@SkySkimmer
Copy link
Contributor

This disagreement reflects different attitudes towards third-party packages, and whether we consider the experience of users who rely on them to be in part a responsibility of coq developers.

By "3rd party packages" do you mean "coq packaged by someone" like https://packages.debian.org/stable/math/coq or "package using coq" like mathcomp etc?

@andres-erbsen
Copy link

I meant the first; debian and etc.

@proux01
Copy link
Author

proux01 commented Mar 1, 2024

@SnarkBoojum in case you haven't noticed it, you may be interested by the discussion above, as a Debian package maintainer

@proux01
Copy link
Author

proux01 commented Mar 3, 2024

Two other points that came up:

  • We agree that the current situation when it comes to handle PRs to the stdlib is not great. I do hope that having a dedicated repo, not mixing stdlib PRs with Coq ones, could help. @andres-erbsen doesn't share my optimism.

  • we currently have

    • coq-core
    • coq-stdlib

    that will become

    1. coq-core
    2. prelude, setoid, defs and axioms for primitive types spec
    3. remaining of current stdlib

    this PR offers to keep coq-stdlib for 2. and use a new name (for instance "roots") for 3., but that's of course part of what should be settled by the CEP.

@CohenCyril
Copy link

Both of us also agree that there is long-term value in curating a cohesive, non-duplicative core of coq libraries and that this CEP isn't really about solving that.

While I agree this CEP is not about solving that, I believe that the possibility of developing such a cohesive library is strongly conditioned by the right choices happening here.

@SnarkBoojum
Copy link

@proux01 Thanks for pinging me, I indeed didn't follow this ticket.

On the Debian side, the coq source package already provides five different binary packages, so a split in different upstreams wouldn't be painful: bin:libcoq-stdlib would be provided by src:coq-stdlib instead of src:coq.

Since I packaged quite a few Coq-related packages, I have the impression that there are:

  1. some parts of the stdlib are experiments and not used outside -- they could be dropped or kept aside ;
  2. that other parts are incomplete but useful hence a few projects have supplements that could be merged ;
  3. that some things are just absent hence a few projects provide supplements that could be merged.

Notice the "could be merged" is easily said: the various downstreams almost certainly didn't complement in coherent ways with one another, so deciding what and how to merge will probably mean everyone will have some work to cope with the merge.

But still, after the initial period where everyone will be unhappy because things get broken everywhere, having a more fluid stdlib will mean people will actually contribute to it instead of adding their own ideas on their own, so that means less duplication on the whole.

That looks pretty positive.

Comment on lines 15 to 17
As part of the promotion effort around the Coq Platform, we would like to
ensure that the stdlib does not enjoy special status and that Coq can be
used without it. We should consider stdlib components as libraries to
Copy link
Member

@herbelin herbelin Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, a better formulation would be the following:

"We would like to ensure that any contribution, any library of the community can be integrated to the standard library."

That is, the point is not to get rid of a standard library. The community does need a standard library. The point is what we put in the "standard" library. In particular, the question is that MathComp and coq-stdpp should join "the" standard library (the one we want).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how do you do that "join" in practice? For a basic example, let's consider ssrnat.v:

  • how do we define <=?
  • what naming convention?
  • do we need to offer a transition path to users?

Without such details, sentences like "MathComp and coq-stdpp should join the standard library" unfortunately just remain wishful thinking.

This is a CEP, it's all about technical details that can be implemented. It's good to offer alternatives, but they should be precise enough that the alternatives can be precisely compared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that I had skipped this question, for which I had a proposal.

But how do you do that "join" in practice? For a basic example, let's consider ssrnat.v:

* how do we define `<=`?

Arith has both <= and <=?. So, we already have a system of notations for talking both of a propositional <= and a computational <=, so:

  • assuming that everyone agrees with using the notation <= for propositional and <=? for computational, we can easily provide in the same library theorems talking about <= and theorems talking about <=? (with names such as, respectively, le_trans and leb_trans/leq_trans, possibly providing several synonymous built according to various popular naming policies);
  • assuming that some users may want to see only the notation <= for the computational <=, providing an option or an optionally importable module that set <= to be the default notation for the computational definition (as in MathComp).
* what naming convention?

Considering that changing habits is in general very difficult, I'm not optimistic that adopting one naming convention that suits everyone would be easy. However, providing synonymous would be easy. E.g. we could provide naming modes and every user could then activate their favorite naming convention that the nametab would take care of.

* do we need to offer a transition path to users?

Regarding propositional and computational <=, I don't see a need for a transition path since they already cohabit as of today. (But if a consensus emerges on how to write the propositional and computational "<=" and about default naming conventions for theorems about them, why not to propose them as unique "standard" conventions, rather than supporting synonymous.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Trocq, we can also reasonably consider that any lemma referring to one or the other version of <= could eventuallly be used in the context of the other version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for induction schemes, whatever definition is used, I believe that we should provide different schemes, and not only the one canonically deriving from the inductive definition of current Peano.le (even, if maybe, some work can be done to improve the current panorama, see what Require Import Arith. Search le "_ind" gives for instance, showing e.g. always-satisfied hypotheses Morphisms.Proper (Morphisms.respectful eq iff) A).

Copy link

@silene silene Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The funny thing about Nat.le and Nat.leb is that we can presumably get a consensus about the definition of Nat.leb (after all, there are not that many computational ways of comparing two unary integers, except for deciding which argument is the recursor). But I am not even sure the propositional definition of Nat.le is consensual. For example, it could just as well have been defined as

Inductive le : nat -> nat -> Prop :=
  | le_O : forall n, le O n
  | le_S : forall m n, le m n -> le (S m) (S n).

Or following Peano's definition, as

Inductive le (n : nat) : nat -> Prop := le_n : forall k, le n (k + n).

And I can think of a few other propositional definitions, which all have their own benefits and downsides.

Copy link

@andres-erbsen andres-erbsen Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am optimistic we can converge here, as long as we are willing to separate the defining property (or multiple) and definitional computational behavior. I think what Pierre is getting at is that having Nat.le compute like Nat.leb would be nice, and I agree. That's not the "definition" I would want for abstract proofs, but that's fine -- we can expose an if-and-only-if lemma as the conceptual definition. IMO the same approach extends to e.g. which inductive structure to use for defining integers, and to many other cases, but not to types that are intended to be done structural recursion over while they appear in indices of other inductives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I can think of a few other propositional definitions, which all have their own benefits and downsides.

I have even a development where we switch between 2 definitions of le depending on needs (one in SProp extended with the Yoneda trick for dependent occurrences, one inductively-designed to get a specific induction principle: Inductive le : nat -> nat -> Type := le_refl n: n <= n | le_down {n p}: S p <= n -> p <= n). I don't think several definitions are a problem as soon as the motivation for each of them is clear, and that they are interchangeable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , I also consider multiple induction principle for one type to be a pretty good experience with induction .. using ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason to standardize is to avoid the low-level overhead of switching between definitions to use different stdlib lemmas in a single proof (this is currently a pain with Bezout, rel_prime, gcd, and Zis_gcd).

@proux01
Copy link
Author

proux01 commented Mar 6, 2024

I updated the CEP following yesterday's discussion.

Copy link
Member

@jfehrle jfehrle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts.

text/083-boost-stdlib-dev/text.md Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Outdated Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Outdated Show resolved Hide resolved
@JasonGross
Copy link
Member

We have a bunch of submodules, some with commits on top of the master branch. Keeping these up to date is a lot of manual rebasing etc. We looked around for existing solutions to this issue but it seems the best solution is to just write some scripts to do this. The amount of work makes it quite difficult to keep the entire repo up to date.

The way I'd solve this is by having two submodules for each development, one that tracks upstream and one that has a branch to be merged into upstream (a folder of patches to apply might work just as well). Dependabot can update the upstream submodules, and CI should merge in the changes / apply the patches before building. Would this work?

@proux01
Copy link
Author

proux01 commented Mar 29, 2024

We improved the motivation section of the CEP.

Let's proceed with the discussion during April 23rd call: https://github.com/coq/coq/wiki/Coq-Call-2024-04-23
Meanwhile, it would be great if the alternative offers could be elaborated in the "Alternatives" section of the CEP, in order to structure the discussion.

@proux01
Copy link
Author

proux01 commented Mar 29, 2024

the comprehensive collection of theory that is expected to be useful to most Coq users and on whose design there is rough consensus

I guess we all agree that's what a standard library should be. Unfortunately, the current stdlib is full of things that don't make consensus, or even gather a strong consensus against them.

pervasive algebraic hierarchies are out because we have not managed to converge on a generally accepted solution

BTW note that if we are strict on this, it does actually exclude quite a lot. Even the way the theory on nat is built in the current stdlib is a kind of algebraic hierarchy construction after all.

not deprecation

The CEP offer to move the stdlib to its own repo, so that it can be kept / evolved in better conditions than it currently endures. Deprecation means intent to remove, that's a totally different thing.

The invariant here should be that stdlib contains the generally recommended solutions, and in cases we know we have failed to maintain this invariant, we should fix it.

Again, we all agree on that definition of an ideal stdlib and should aim in that direction. But the current stdlib is so far from that objective that the best way to reach it, without breaking everything, is far from obvious. Meanwhile, let's first try to improve the current situation by reaching a state were just moving into that direction seems more realistic.

I see the costs of this kind of fragmentation as very high, so perhaps Reals is the only part of stdlib that I would currently consider deserving of separate distribution.

This is a straw man, the CEP reads "Allow maintainers to extract stdlib components [...] in case they wish to do so.". It's about enabling it in case it would make sense, not splitting everything upfront (which I agree would certainly be very bad).

I would love to use these features, but when we tried to use them in coqutil, they were not ready. Stdlib rigidity is not what is keeping us back here. On the contrary, whether we are able to use a feature in stdlib is a good litmus test for whether that feature is ready for general adoption.

We should also understand that developers of new features don't necessarily have time / motivation to adapt an entire library. Having a more delimited prelude only to adapt could also help the development of such new Coq features, by enabling their adoptions first in basic libraries like the stdlib then in their reverse dependencies. If the development of the stdlib is stuck like currently, that could indeed be blocking the feature spread.

We should do an open call for maintainers

Yes, we should. Right now,

IMHO, that would be the worst thing to do. If I were such a new maintainer, I would certainly feel more frightened by the current discussion, than welcomed. Resolving that discussion and setting a more welcoming development environment seems to me like a strong prerequisite to a successful call for new stdlib maintainers.

Specifically on whether to split the repo (coq+stdlib) or not, the only concrete argument in favor of splitting is that contributors could then subscribe to notifications on stdlib only. I propose instead implementing a bot that CCs them on stdlib PRs.

Well it's much more than that: not having all the frightening OCaml code around, the review process that may be too heavy...

Instead, I would like to see the following:

Yes, please add a concrete proposal in the "Alternatives" section of the CEP so that all alternatives can be efficiently discussed.

This shall not rely on one being in France

[off topic]
Sure, are we doing anything really bad wrt that point currently? Of course, for historical reasons, most Coq developers are french people and that reflects on the few physical meetings for practical reasons. But besides that, it seems to me that we make our best to make all public conversations happen in our best-possible English and be as welcoming as possible to non-french people. Is there anything you feel we could improve?
[/off topic]

coming to Coq calls

Yeah, I guess stdlib maintainers could set their own set of calls if they wish so.
And I agree that needing to attend a Coq call to just ask for a PR review, is a pretty terrific situation.

or any specific affiliation

Sure, are we doing anything really bad about this currently?

An overlay "future" library (a la python) that makes stdlib of coq version n available on coq version n-1 whenever possible, with a streamlined process (or even a bot?) for simultaneously contributing to both.

Just decoupling stdlib releases from Coq ones would make it much easier for each version of the stdlib to support multiple versions of Coq, just like most regular libraries do. That would be one of the advantages of the dualrepo approach.

A CI policy restricting external dependencies of CI targets to help developers build it more easily locally

This is quite orthogonal to the current CEP (maybe you should open a CEP about CI to discuss that kind of things).
[off topic]
But TBH, as a Coq developper, the current depth of the dependency tree in the CI doesn't feel particularly painful. The long compilation times of some of the developments and having to struggle with things like submodules feels much worse (hope the recent developments will improve things on that point, I haven't experienced them first-hand yet ;) ). The lack of clarity of the CI contract about some points also doesn't help and can make us feel more enslaved than helped by the CI.
[/off topic]

@andres-erbsen
Copy link

andres-erbsen commented Mar 30, 2024

Quick clarifying responses below; I am working on putting together something more structured and comprehensive.

nat is built in the current stdlib is a kind of algebraic hierarchy construction after all

But it's not pervasive -- most users don't even know it's there!

Deprecation means intent to remove, that's a totally different thing.

We should also understand that developers of new features don't necessarily have time / motivation to adapt an entire library

Which one of these do you mean? It really does feel like the goal of this CEP is to get the current stdlib out of the way and make way for its replacement, as opposed to reinvigorate it. (And see coq/coq#18870 and coq/coq#18871 for fresh samples of what new Coq features are stuck on -- basic tactics, not the theory you are proposing to (re)move here.)

Sure, are we doing anything really bad wrt [favoring France] currently?

I sure feel left out, and sometimes mystified by responses from the Coq team. I have been suggested that this is because I am not present in the in-person meetings.

We improved the motivation section of the CEP.

Who is "we"?

current depth of the dependency tree in the CI doesn't feel particularly painful

It's not the depth that is the issue, it is dependencies on random opam and other packages.

I don't think we should burden the stdlib with support for multiple Coq versions as we are already struggling to keep up with features present in one.

@herbelin
Copy link
Member

I guess we all agree that's what a standard library should be. Unfortunately, the current stdlib is full of things that don't make consensus, or even gather a strong consensus against them.

Could we try to list somewhere examples of what are consensual and non-consensual aspects of libraries (e.g. as an appendix of the CEP, or on a wiki page, complementing e.g., for the current stdlib, this wiki page)?

Also, how easy or complicate would it be to address what does not make consensus in the current libraries? (For instance, I've heard questions about heterogeneity, in naming, in order of arguments, in style; I've heard questions about using bool or Prop for decidable propositions; I've heard questions about having only le or also ge, about notations, about documentation, about complexity of use, about policies, about reusability, ... I don't feel all that unsurmountable.)

If the main objective is to join the best of all our efforts into something that is usable and satisfactory to all users, let's do it.

@proux01
Copy link
Author

proux01 commented Apr 9, 2024

[off topic] (but I don't know where to put this)

I sure feel left out, and sometimes mystified by responses from the Coq team.

Can you elaborate? (although I understand that precising a feeling can be difficult)

I have been suggested that this is because I am not present in the in-person meetings.

That's pretty unclear to me. I'm myself not present to that many in-person events and don't feel excluded. We should also acknowledge that you are part of the dev team of fiat_crypto, which presents many differences with the remaining of the Coq ecosystem, and some long lasting tensions, in particular around backward compatibility and CI. I'd expect this to play a much more important role than location.

It's not the depth that is the issue, it is dependencies on random opam and other packages.

Interesting, I've never felt really impeded by that. You setup your system once and for all, whichever way you want (no obligation to use opam). It's not like you would need to recompile everything everytime or things would be constantly changing at a fast pace.
[/off topic]

I don't think we should burden the stdlib with support for multiple Coq versions as we are already struggling to keep up with features present in one.

I'm not sure it would be that hard / much extra work. Anyway, I was just mentioning it because you suggested a solution that looked more labor intensive.

[off topic] (the discussion about how to evolve the stdlib is a very interesting one, and one that we should have, but it goes way beyond the objectives of the current CEP)

Also, how easy or complicate would it be to address what does not make consensus in the current libraries? (For instance, I've heard questions about [...] I don't feel all that unsurmountable.)

I agree none of that is insurmontable, it's rather the amount of such small things, accumulated over the stdlib history, that make it a huge endeavour.
[/off topic]

@andres-erbsen
Copy link

My issue with this CEP is that it presumes a conclusion to the more general discussion about stdlib -- that we don't want it to be as central and distinguished as it is now. Changing the way stdlib is compiled to enforce internal dependencies is not controversial IMO.

Wdyt about moving the very-off-topic discussion to a zulip thread, maybe "future of stdlib maintainership"?

To our knowledge the monorepo approach has exactly one advantage:
it is easier to backport changes from one package to an upstream package.
It also has one major drawback: it makes it harder to have the following dependencies:
`coq-roots-A <- coq-extralib-B <- coq-roots-C`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an important alternative to consider here is performing the proposed logical-level refactoring in the current repository, without the reimagining the distribution model or conceptual role of the standard library. It would still achieve the concrete benefits for users who only want to import a part of it, retain the refactoring that has already been performed for this CEP, but minimize impact on users from further refactoring between the hard-won component boundaries. If individual components do diverge or want to add external dependencies as foreseen here, they could be spun out to their own packages as needed.

I also have opinions about the motivating vision for the future of stdlib, but to keep this discussion here focused on the component structure, I intend to post a whole another CEP for that in time for the scheduled Coq-call slot.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please fill the "Alternatives" section of the CEP with your proposal and what you perceive are its pros and cons.

I also have opinions about the motivating vision for the future of stdlib, but to keep this discussion here focused on the component structure, I intend to post a whole another CEP for that in time for the scheduled Coq-call slot.

Thanks, that will definitely be the next discussion to have once the current one gets resolved, and a CEP is probably a good way to drive the discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by the way, no hurry to create the next CEP before discussing the current one in Coq Call. Those are two different discussions.

Copy link

@andres-erbsen andres-erbsen Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I add something to the alternatives section?

If these are different discussions, would you be willing to remove the nontechnical decisions about the future of stdlib from this CEP? Otherwise the proposed discussion schedule feels a little bit backwards.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I add something to the alternatives section?

I'd expect you, as a maintainer, to be able to simply push to the branch of the CEP. If not, please ping me so that I give you proper rights.

If these are different discussions, would you be willing to remove the nontechnical decisions about the future of stdlib from this CEP? Otherwise the proposed discussion schedule feels a little bit backwards.

Sure, those belong to a further CEP. The goal of the current one is only to unstuck the development of the current stdlib and "Enabling the development of an evolved or different official "stdlib"". Which one of the latter options we want to choose is outside the scope of the current CEP, we should only keep all doors open.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub thinks that I do not have permissions to edit coq/ceps or your branch.

Copy link

@andres-erbsen andres-erbsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on the narrowest interpretation of this CEP follow within. Overall I am in favor of structuring the build of stdlib along the lines of the proposed.

* Some compatibility files with older versions. Namespace `Coq.Compat`.
* 5 files, 0.1 kloc
* <details><summary>contains:</summary> Compat/AdmitAxiom.v,
Compat/Coq817.v, Compat/Coq819.v, Compat/Coq818.v,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compat files do not have a stable dependency list: in each release, they could overlay a different part of the stdlib. Perhaps they should depend on everything.

text/083-boost-stdlib-dev/text.md Show resolved Hide resolved
* Nullstellensatz tactic nsatz. Namespace `Coq.nsatz`.
* 2 files, 0.6 kloc
* <details><summary>contains:</summary> nsatz/Nsatz.v,
nsatz/NsatzTactic.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NsatzTactic and the associated plugin do not depend on reals, and Nsatz should be a part of reals instead of nsatz.

@coq/nsatz-maintainers do you have an opinion on whether Gallina and ML code for nsatz should stay in the same repository/package?

Wellfounded/Transitive_Closure.v, Wellfounded/Well_Ordering.v,
Wellfounded/Inverse_Image.v, Wellfounded/Disjoint_Union.v,
Wellfounded/Lexicographic_Product.v,
Wellfounded/Lexicographic_Exponentiation.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Lexicographic_Exponentiation is the only part of Wellfounded that depends on list. It's not an issue right now, but it seems entirely plausible that we might want a list-library function to be defined by well-founded recursion -- an example of how splitting at this granularity can be awkward.

* Namespace `Coq.rtauto`
* 2 files, 0.8 kloc
* <details><summary>contains:</summary> rtauto/Bintree.v,
rtauto/Rtauto.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is Rtauto used? (I thought micromega depended on it, but apparently not.)

@coq/rtauto-maintainers do you have an opinion on whether Gallina and ML code for rtauto should be in the same repo/package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tauto and rtauto going to be merged some day?

* 1 file, 0.0 kloc
* <details><summary>contains:</summary>
theories/Arith/Arith.v</detail>
* coq-roots-lia

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coq/micromega-maintainers do you have an opinion on whether the lia and zify plugins should be in the same repo/package as the certificate-checker and Add Zify calls, or does it not matter?

theories/setoid_ring/Algebra_syntax.v,
theories/setoid_ring/InitialRing.v, theories/setoid_ring/Cring.v,
theories/setoid_ring/Ncring_tac.v,
theories/setoid_ring/Ring_tac.v, theories/setoid_ring/ArithRing.v,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coq/ring-maintainers do you have an opinion on whether Gallina and ML code for ring should be in the same package/repo or not?

theories/Numbers/HexadecimalZ.v,
theories/Numbers/HexadecimalFacts.v,
theories/Numbers/HexadecimalPos.v, theories/btauto/Algebra.v,
theories/btauto/Reflect.v, theories/btauto/Btauto.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does btauto depend on ZArith, or is it only that ZArith extras depend on btauto?

@coq/btauto-maintainers do you have an opinion on whether Gallina and ML code for btauto should be in the same package/repo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btauto is using ring and lia, so they do depend on zarith, at least indirectly.

* The Coq standard library, extraction of micromega
* 1 file, 0.1 kloc
* <details><summary>contains:</summary>
micromega/MExtraction.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on comments in the file, maybe mircomega extraction should be in test suite rather than its own package?

theories/QArith/Qpower.v, theories/QArith/Qcanon.v,
theories/setoid_ring/Field_theory.v,
theories/setoid_ring/Field_tac.v, theories/setoid_ring/Rings_Q.v,
theories/setoid_ring/Field.v</details>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unfortunate that field comes bundled with Q. Needn't do anything about it now, just another case where package the package boundaries are not really good for external consumption even though they would be immediately useful internally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fundamental reason for that: field is a reflexive tactic that uses Q for its computations.


# Alternatives

The technical changes proposed here can stand on their own merits, and can be adopted without ratifying the perspective presented for motivating them. Further, each of the three splits discussed under [Packages](https://github.com/proux01/ceps/blob/boost-stdlib-dev/text/083-boost-stdlib-dev/text.md#packages) could be performed without the other ones, with or without the broader vision that led to them.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence there reads "each requiring the previous one", which seems contradictory?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does. Could you explain are the requirements because of which 2 cannot happen without 1 and 3 cannot happen without 2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, 2. in https://github.com/proux01/ceps/blob/boost-stdlib-dev/text/083-boost-stdlib-dev/text.md#packages is exposing the split so it need to exist in the first place (that is 1.) and 3. is repo split so hard to do without exposing that split (2.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in that sense, yes. Technically it would be psosible to use a monobuild but distribute parts of it, IMO that'd be asking for trouble though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which kind of troubles? In my experience it works perfectly fine (we are successfully doing it for mathcomp for as long as I can remember).
I can only see some troubles if you want to change the component boundaries but even that is perfectly manageable, since, by definition, it happens at the exact same time for all packages that only depend from other packages of the exact same version.

text/083-boost-stdlib-dev/text.md Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Outdated Show resolved Hide resolved
text/083-boost-stdlib-dev/text.md Outdated Show resolved Hide resolved
2. The standandard library could be moved to its own repository without changing its user-facing relationship with Coq and non-stdlib packaes.
3. Specific parts of the standard library could be spun out to their own repositories without changing the rest.

Out of these, 1. has the advantage that it retains most of the technical refactoring done for this CEP, and achieves some of the technical goals listed here such as allowing users to `Require` only a known subset, while minimizing disruption to users in the face of multiple possible futures for the standard library more generally. In particular, keeping the boundaries between subparts of the standard library internal to Coq as opposed to exposing them to users would simplify later changing them.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[quoting precisely because the line is way too long to attach comments to a specific part]

In particular, keeping the boundaries between subparts of the standard library internal to Coq as opposed to exposing them to users would simplify later changing them.

That's a good argument against exposing the package split (point 2. in https://github.com/proux01/ceps/blob/boost-stdlib-dev/text/083-boost-stdlib-dev/text.md#packages ).

My feeling is that you are strongly opposed to the (somewhat orthogonal) dualrepo choice. What are the concrete
arguments behind that position, I can see

  • monorepo is a way to strongly guarantee no additional dependency can be added

are there any other?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key advantage of the monorepo is the one that you already listed in the CEP: it drastically simplifies the logistics of making a change that affects multiple parts (compared to these parts being in separate repositories). A special case of this is that a monorepo makes it easy to ensure internal compatibility by testing every PR in a single CI. Making changes that (conceptually simultaneously) touch coqc, stdlib, and CI developments is a key part of my vision for how to evolve this ecosystem, so I am indeed setting a higher bar for splitting the repo.

While tooling for different workflows within a monorepo can go a long way, I am yet to see a concept for smooth and reliable coordination of changes across different repositories. Overlays as in Coq are very useful given the multirepo reality of Coq CI but they have merge races and pull-request toil (and delays), pinned submodules as in fiat-crypto act as a monorepo with an extra step when everything is moving smoothly and cause an effective freeze of the subrepo code otherwise, maintaining a stable interface across versions like Coq currently strives for is already very limiting on what kind of changes Coq developers can make and I worry it would be even more limiting if the stdlib plugins became a part of that standard interface. Maintaining compatibility in a multirepo environment is also a constant chore in BoringSSL which I sometimes contribute to: there is a porting-duty rotation between the maintainers, and big changes are still tested with supported users before merging anywhere. Basically, every arrangement where a library actually supports its users requires adapting these users to changes at some point, and being able to do this atomically in a single commit in the single repository is the least painful mechanism I have seen, by far.

For a monorepo to make sense, the maintainers of that repository have to have working baseline agreement about what to do with it and mutual trust that only changes that are in line with this agreement are merged. IMO the required consistency bar for is rather low, but it is not zero -- this is why I focus on process questions in #86. Agreeing to disagree can still work well, and it is not hard to imagine e.g. separate agreed-upon and perhaps even coqbot-enforced rules for different CODEOWNERS. On the other hand, treating some piece as an external dependency with an independent release schedule and compatibility story seems to inherently come with human toil for each coupled change, so I would explore this option only when synchronized development is known to be substantially at odds with the goals of one project or another. I get the sense that you probably do have a vision for stdlib that is at odds with it remaining a part of Coq, but I have only a vague sense for what it is, so it cannot convince me to give up the monorepo.

And as a user of coqc+plugins+stdlib, I count them as "one dependency" exactly because I trust the maintainers of these parts to coordinate effectively enough between themselves that I never have to think about compat between them. (Taking this view to a silly extreme, Debian is "one dependency" too, but it is not the same dependency as Coq, and trying to work with both of them is indeed constraining for fiat-crypto).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. So the root of our disagreement seems to be a strong "philosophical" opinion about monorepo vs. distributed developments with version management. Some people tend to think that repository boundaries should reflect human team boundaries, whereas you'd prefer something wider.

[off topic but maybe interesting]
About the CI of Coq: it seems we also have a quite different approach of it. You seem to see it as some approximation of your monorepo dream. My dream is rather an open world where everyone can maintain its own Coq development without any need to be in the CI, only by easy adaptation to each new Coq / other-dependency release, following its changelog and warning / errors.

You seem to expect a lot from the CI, whereas IMHO, the CI is only a kind of expansive test suite and developers of developments in the CI should only expect:

  • less Coq bugs on their dev since CI should catch them early;
  • getting involved early in discussions for breaking changes when such changes are needed.

in exchange of having to constantly remain compatible with master.
(Plugins are a particular case where we commit to provide overlays, due to the unstability of the API.)
[/off topic but maybe intresting]

More precised answer to detailled points:

Overlays as in Coq [...] have merge races

This is theoretically true. However, in my experience, the few merge races I did experience in Coq where never related to overlays. So I wonder how practical the issue is.

and pull-request toil (and delays)

Right, although again the comment mostly make sense when both sides of the overlay are done by the same "team". Those costs are also a good way to get an idea of how, and at which cost, external devs can be adapted.

pinned submodules as in fiat-crypto act as a monorepo

Ironically enough, as a Coq developper, my feeling is that those are responsible for a large part of the "overlay toil". Again, only my personal experience, not a fact.

I get the sense that you probably do have a vision for stdlib that is at odds with it remaining a part of Coq

If by "part of Coq" you mean "same repo", then the main technical point of this CEP is the dualrepo choice, so yes of course. If you mean something else, please clarify.

In any case, it seems we mostly share the same vision of an ideal stdlib. It also seems that the current stdlib should remain usable by all its current users (whether in CI or not) without crazy amounts of effort. Whether the two goals are compatible remains unclear to me.

Copy link

@andres-erbsen andres-erbsen Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, in my experience, the few merge races I did experience in Coq where never related to overlays.

Okay, maybe it's not specifically overlays at fault here. But the loose coupliing of CI developments is definitely a major cause behind CI failures: 4 out of 5 last non-cancellation failures look like an atomically-merge-when-tests-pass policy would have prevented them: https://github.com/coq/coq/runs/23898721987, https://github.com/coq/coq/runs/23874454446, https://github.com/coq/coq/runs/23874454446, https://github.com/coq/coq/runs/23874782784, https://github.com/coq/coq/runs/23874833864

Those costs are also a good way to get an idea of how, and at which cost, external devs can be adapted.

For stdlib changes, IME the costs of actually adapting the developments are a tiny fraction of the coordination overhead of getting the right person to look at the change and click merge because it is too boring to look at. I think it has happened to me a grand total of once that feedback from a adaptation PR led to a change in the stdlib work.

Ironically enough, as a Coq developper, my feeling is that those are responsible for a large part of the "overlay toil". Again, only my personal experience, not a fact.

I mean, I'm not saying the submodules are great, just the alternative would be fiat-crypto being red as often as other large multirepo developments. And the particularly slow and painful cases I've witnessed were exactly because some non-coqdev-related codebase-synchronization issues needed to be resolved before the coqdev patch could be accepted.

Whether the two goals are compatible remains unclear to me.

Which goals are you referring to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 out of 5 last non-cancellation failures [...]

Those are not "real failures" but the consequence of synchronous overlay merge being indeed non atomic. Synchronous overlays are only a module thing. Since modules would remain in Coq repo, the issue wouldn't affect the library repo. So we can either consider the issue orthogonal to the current CEP or even the CEP being an improvement here, since it releases the library from all issues linked to synchronous overlays.

I'm not saying the submodules are great, just the alternative would be fiat-crypto being red

[off topic]
I really have the feeling that coq-util / bedrock2 / rupicola / fiat-crypto / ... should live in a single repo.
From my outsider point of view, their team seem to have a pretty strong intersection.
[/off topic]

Whether the two goals are compatible remains unclear to me.

Which goals are you referring to?

The ones from the previous sentence:

  • evolve the current stdlib to make it truly "standard" / state of the art
  • not dramatically breaking every uses

Anyway, that's definitely beyond the scope of the current CEP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing anything interesting in that commit history, what am I missing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the history doesn't really tell a story on its own, and I don't remember well enough to recount it play-by-play. But my takeaway was that a change (making nsatz good to use for custom fields) that would have one commit (or perhaps two for external compat) in a monorepo dragged out over years. This is not purely a technical point on the monorepo/dualrepo choice, as another important factor was slowness and uncertainty of contributing to Coq regardless of synchronization, but nevertheless, it is a negative experience with maintaining Ltac code that wraps an ML plugin in Coq.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by fundamental. Synchronous overlays are needed for ML code because the ML API is unstable.

In the context of this PR, maybe we should check (both in Coq and the library) that it is properly documented that synchronous overlays are only acceptable for ML code, not for anything else (meaning synchronous overlays would be unacceptable in the library repo).

[off topic]
I agree that due to this, maintaining a ML plugin is bound to be, to some extent, an uncomfortable experience, forcing you to either be compatible with a reduced set of Coq versions or have some mechanism for conditional compilation of multiple plugin versions.
[/off topic]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronous overlays are needed for ML code because the ML API is unstable

meaning synchronous overlays would be unacceptable in the library repo

I mean that the the Ltac/Gallina APIs for stdlib plugins are no better than the ML ones (it's not about the language!), so freezing them by leaving them at the repository boundary seems unwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, as already discussed, having that interface more exposed absolutely doesn't mean freezing it, only that more care must be taken when modifying it.

It also has one major constraint (considered an advantage by some and a drawback by others): it makes it harder to have the following dependencies:
`coq-roots-A <- coq-extralib-B <- coq-roots-C`.

The most concrete technical advantage of moving stdlib to its own repo, the ability to subscribe to stdlib changes, could also be achieved using a coqbot feature that @-s a specific team of stdlib subscribers.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on which objectives this does or does not fullfil? I can see as fullfiled:

  • enabling stdlib contributors to get only notifications about the library (and not the other Coq PRs)

and as not fullfilled:

  • not having lot of OCaml code in the library repo
  • not practically needing to compile Coq master to contribute to the library
  • avoiding the heavy reviewing process of Coq
  • keeping all doors open with respect to "Enabling the development of an evolved or different official "stdlib""
  • "Being sure that users are properly discouraged to rely on parts of the stdlib that are considered subpar, especially when recommended alternatives are available through the Coq Platform."
  • enabling the use of Coq without the stdlib (for instance to run the test-suite)
  • enabling more freedom on the release process of the library

Copy link

@andres-erbsen andres-erbsen Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this list of goals, somehow it seems much clearer to me than the version in the CEP.

  • not having lot of OCaml code in the library repo

I do not understand why you find this desirable, and I am worried about it. The stdlib plugins are one of the strongest advantages of Coq IMO, and I want to make sure we don't make improving them harder.

  • not practically needing to compile Coq master to contribute to the library

I had not realized this was a goal, I agree it would be nice to have. Yet compiling Coq is such a tiny fraction of the pain of compiling the CI and its dependencies that I am not sure that we should make any decisions based on this without a vision for CI.

  • avoiding the heavy reviewing process of Coq

I agree that the unavailability of reviewers is a problem, but I also find code review valuable. I there anything else you definitely want to see changed?

  • keeping all doors open with respect to "Enabling the development of an evolved or different official "stdlib""

I don't see how this needs any action, the doors are already open.

  • "Being sure that users are properly discouraged to rely on parts of the stdlib that are considered subpar, especially when recommended alternatives are available through the Coq Platform."

We have warnings for that, and we can and have removed content that we don't recommend. I don't think we need another solution.

  • enabling the use of Coq without the stdlib (for instance to run the test-suite)

I agree that this is desirable, but the test suite is such a small fraction of the CI that I am not sure whether this matters (and the dune-based plan seems more appealing to me as long as coq itself is built with dune).

  • enabling more freedom on the release process of the library

Again, I am worried that exercising this freedom would put stress on the plugins that I use Coq for, and on developing the stdlib itself. (more here)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this list of goals, somehow it seems much clearer to me than the version in the CEP.

It's mostly just an extract of it though.

  • not having lot of OCaml code in the library repo

I do not understand why you find this desirable, and I am worried about it.

Developing libraries in Coq and developing Coq itself are two diferent things, done in diferent languages. We don't want to make it look like mastering OCaml should be anyhow needed to contribute to the library.

The stdlib plugins are one of the strongest advantages of Coq IMO,

OCaml plugins are relatively unstable due to the ever changing Coq API. I'm not sure we should promote their development that much. The future is probably more in the use of dedicated metalanguages like metacoq, Ltac2 or elpi.

and I want to make sure we don't make improving them harder.

What makes plugin development hard is the unstable, more or less documented, API, much more than their interaction with the library. So keeping them in Coq repo shouldn't be much trouble.

  • not practically needing to compile Coq master to contribute to the library

I had not realized this was a goal, I agree it would be nice to have. Yet compiling Coq is such a tiny fraction of the pain of compiling the CI and its dependencies that I am not sure that we should make any decisions based on this without a vision for CI.

That's the point fo view of a poweruser. Imagine you're a newcomer and want to add some list lemma, you certainly don't want to compile Coq master and you'd very probably don't have to run any of the CI targets locally since it would just be green online.

  • avoiding the heavy reviewing process of Coq

I agree that the unavailability of reviewers is a problem, but I also find code review valuable. I there anything else you definitely want to see changed?

I'm not saying we should do it. But I also remember merging a few of your PRs where clearly having you merge it yourself would only have gained time and saved me an empty review.

  • keeping all doors open with respect to "Enabling the development of an evolved or different official "stdlib""

I don't see how this needs any action, the doors are already open.

No, imagine we choose to go for the "different official "stdlib"" path (I'm not saying we should, I don't know), then we would still have to move the current library to keep maintaining it, so we would be back to the beginning, with that door remaining closed.

  • "Being sure that users are properly discouraged to rely on parts of the stdlib that are considered subpar, especially when recommended alternatives are available through the Coq Platform."

We have warnings for that, and we can and have removed content that we don't recommend. I don't think we need another solution.

That's an opinion. I tend to think that displaying packages, to a proper library, on an equal footing with the alternatives, would help, compared to one of the alternatives being forced the default.

  • enabling the use of Coq without the stdlib (for instance to run the test-suite)

I agree that this is desirable, but the test suite is such a small fraction of the CI that I am not sure whether this matters (and the dune-based plan seems more appealing to me as long as coq itself is built with dune).

I agree it's a more minor point and there are other solutions to this. Still, it would be welcome (consider for instance working on the printing of Coq, compiling the stdlib every time is a pure waste of time).

  • enabling more freedom on the release process of the library

Again, I am worried that exercising this freedom would put stress on the plugins that I use Coq for, and on developing the stdlib itself.

Most of, maybe nearly all, the changes to the plugin come from API adaptations rather than library adaptations, so I'm not really worried about that requiring great deals of effort. At the very worst, we could imagine schemes were the plugin would have to run alternative code for two different versions of the library, but that need will likely arise very rarely.

Copy link

@andres-erbsen andres-erbsen Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to make it look like mastering OCaml should be anyhow needed to contribute to the library.

Do you think there is a reason why this couldn't be straightforwardly clarified through documentation?

The future is probably more in the use of dedicated metalanguages like metacoq, Ltac2 or elpi.

Maybe, but I don't see how these new languages would help specifically the stdlib plugins. E.g. elpi has "Lambda" in the name, and lia/field/nsatz do not manipulate any lambdas (it's been a while since I played with it though). Sure, there is some maintenance for with the ML api getting new fancier universes and so on, but constructing application nodes surely isn't changing so unstable that it needs to be avoided. I also expect multiplicative costs with both number of repositories and number of languages, so I would expect we'd do a CEP and consider alternatives for adding any of these to the standard library.

(And wouldn't brand-new domain-specific languages be even more intimidating than OCaml?)

What makes plugin development hard is the unstable, more or less documented, API, much more than their interaction with the library. So keeping them in Coq repo shouldn't be much trouble.

I agree this is a problem at least for some plugins, but it is not the problem that I was trying to point out. Each plugin here exists as a combination of a certificate generator written in ML (perhaps later in another language) and a certificate checker written in Gallina. The interface between them isn't designed for wider consumption, and I have found the documentation to be wrong when I have tried relying on it. Having that interface straddle repositories with separate release schedules would require essentially any improvement to it to have a deprecation delay. This seems quite important to avoid.

Imagine you're a newcomer and want to add some list lemma, you certainly don't want to compile Coq master and you'd very probably don't have to run any of the CI targets locally since it would just be green online.

Sadly, no. It is especially the simple stuff for which a large part of the effort is multirepo coordination overhead. Often it's just because morally equivalent lemmas with the same name already exist downstream, differing perhaps in names of binders only. nth_error was basically as easy as it gets, and I don't want to imagine doing it with web-interface-driven CI.

No, imagine we choose to go for the "different official "stdlib"" path (I'm not saying we should, I don't know), then we would still have to move the current library to keep maintaining it, so we would be back to the beginning, with that door remaining closed.

Even a well-resourced effort for a total makeover could happen in the same repo. I think what you are saying is that if we want to deprecate the stdlib without porting users, moving it to a separate repo would be the way to do it. Perhaps that's right, but I don't want us to take that path, and making way for it isn't compelling as motivation for this CEP.

I tend to think that displaying packages, to a proper library, on an equal footing with the alternatives, would help, compared to one of the alternatives being forced the default.

Yeah, we disagree here. IMO it is on us as stdlib maintainers to pick a default and own it. I am especially wary of throwing out old theory/plugins/languages without a replacement that is proven on the same tasks.

At the very worst, we could imagine schemes were the plugin would have to run alternative code for two different versions of the library, but that need will likely arise very rarely.

How do you imagine this would work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to make it look like mastering OCaml should be anyhow needed to contribute to the library.

Do you think there is a reason why this couldn't be straightforwardly clarified through documentation?

In my opinion, documentation can dampen the feeling but not completely remove it. Of course, that's a matter of personnal taste.

Sure, there is some maintenance for with the ML api getting new fancier universes and so on, but constructing application nodes surely isn't changing so unstable that it needs to be avoided.

That makes sense in theory. In practice, plugins like paramcoq tend to get quite a lot of overlays.

I also expect multiplicative costs with both number of repositories and number of languages,

I'd rather expect the number of metalanguages to stabilize to only a few. Hard to predict the future but it doesn't seem to be blowing up anyway.

I would expect we'd do a CEP and consider alternatives for adding any of these to the standard library.

Sure, the current plugin are OCaml plugins and we won't change that overnight. I was just pointing out that we probably shouldn't promote the development of new OCaml plugins and that over time, they might become more and more rare / dedicated to only a few very few specific use cases.

(And wouldn't brand-new domain-specific languages be even more intimidating than OCaml?)

Yes and no. The thing is that it's not OCaml itself but use of OCaml to manipulate Coq terms / environments. And specific languages, with good quotations mechanisms for instance, can go a long way in simplifying that.

Each plugin here exists as a combination of a certificate generator written in ML (perhaps later in another language) and a certificate checker written in Gallina. The interface between them isn't designed for wider consumption, and I have found the documentation to be wrong when I have tried relying on it.

Making the interface more palatable could also be seen as an advantage and could help improve documentation / reusability. In my own experience (again, only a personal experience), making a few changes to the micromega plugin in order to make it easily and effectively reusable in a foreign library proved pretty easy and stable enough to not require any change over many Coq versions. So I'm not really worried here, quite the contrary.

Imagine you're a newcomer and want to add some list lemma, you certainly don't want to compile Coq master and you'd very probably don't have to run any of the CI targets locally since it would just be green online.

Sadly, no. It is especially the simple stuff for which a large part of the effort is multirepo coordination overhead.

Ok, our experience in library development seem to fundamentally differ here. My own experience is that adding new things is pretty easy and usually not breaking reverse dependencies in CI. Anyway, my opinion is rather that if you are breaking anything in the CI, it's a relatively advanced change that requires some care, to the point that it's worth the effort of running things locally, to understand how you can minimize the disruption.

Often it's just because morally equivalent lemmas with the same name already exist downstream, differing perhaps in names of binders only.

Sounds like the usual backporting chore. Monorepo including stdlib and reverse dependencies would help making this faster but I don't really see this kind of thing happening between Coq itself and the stdlib, at least not in a way that inherently differs from any other Coq use.

No, imagine we choose to go for the "different official "stdlib"" path (I'm not saying we should, I don't kn
ow), then we would still have to move the current library to keep maintaining it, so we would be back to the beg
inning, with that door remaining closed.

I don't want us to take that path

I understand you are strongly against this possibility, but you cannot deny that not enabling it is keeping doors closed.

I tend to think that displaying packages, to a proper library, on an equal footing with the alternatives, would help, compared to one of the alternatives being forced the default.

Yeah, we disagree here. IMO it is on us as stdlib maintainers to pick a default and own it. I am especially wary of throwing out old theory/plugins/languages without a replacement that is proven on the same tasks.

It's not about "throwing out" anything, it's only about helping users to be free to make informed choices. Coq is an open source software, not a closed proprietary thing.

At the very worst, we could imagine schemes were the plugin would have to run alternative code for two different versions of the library, but that need will likely arise very rarely.

How do you imagine this would work?

OCaml code like "if (new_gallina_constant is present) then new_feature" and, later, when dependencies ensure that "new_gallina_constant" is prensent we can remove the if. Again the need for this is probably quite rare so the small overhead seems largely acceptable (the if might even be a necessary thing just to ensure that the proper library has been required anyway).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins like paramcoq tend to get quite a lot of overlays

Paramcoq seems vastly more complex in its use of the Coq API than nsatz and micromega to me.

making a few changes to the micromega plugin in order to make it easily and effectively reusable in a foreign library proved pretty easy and stable enough to not require any change over many Coq versions. So I'm not really worried here, quite the contrary.

My different experience is also with micromega: coq/coq#16862, coq/coq#18730 (comment).

[on deprecating stdlib] I understand you are strongly against this possibility, but you cannot deny that not enabling it is keeping doors closed.

But I will deny it: nothing prevents one from making a PR to deprecate any fraction of the stdlib where it sits today. Or is the difference about who would review that PR and how we would expect them to respond?

"if (new_gallina_constant is present) then new_feature"

Thanks, I agree this could work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[on deprecating stdlib]

I fail to understand why you keep bringing "deprecating stdlib" whereas there is no such thing in the CEP.

@andres-erbsen
Copy link

andres-erbsen commented Apr 17, 2024

I think the remaining disagreement we have is only tangentially about monorepo vs dualrepo. Yes, I have had consistently good experiences with monorepos compared to other setups, but if I happened to start working on coq in a circumstance very similar today except that stdlib and coqc were in different repos already, that wouldn't be on my shortlist of things to change.

Perhaps much closer to the crux is what I brought up in the question of motivation I raised in the very first comment in this PR: whether the 7 points in this comment are actually effective and important to boost stdlib development. At least in the very narrow textual sense, two of them (5 and 6) are just not about helping stdlib, two are about unclear future changes (4 and 7), another two just don't seem important to me (1 and 2), and the remaining one (3) is so complicated that I wrote #86 to discuss it. So it's just hard for me to get behind any large-scale change based on this motivation, be dualrepo or whatever.

@proux01
Copy link
Author

proux01 commented Apr 22, 2024

I think the remaining disagreement we have is only tangentially about monorepo vs dualrepo.

Yet this is one of the core technical points of this CEP and my understanding is that you strongly oppose it.

I wrote #86 to discuss it. So it's just hard for me to get behind any large-scale change based on this motivat
ion, be dualrepo or whatever.

My understanding is that many things here and there relate around leaving more freedom to library maintainers. The dualrepo choice seems to be a perfect way to achieve this.

@andres-erbsen
Copy link

I do see how dualrepo would make it easier to achieve my wishes in #86. And if that's the only way, I can see myself being in favor of dualrepo despite still expecting substantial technical downsides. But was the desire for process changes really the reason you created this CEP?

@jfehrle
Copy link
Member

jfehrle commented Sep 22, 2024

Responding to @andres-erbsen's request in #19530, here are my current thoughts. They may evolve. I've not reviewed all the comments here--they are quite lengthy.

Reference manual My initial concern was that the PR drops a lot of material from the Sphinx documentation and makes undesirable changes at the detail level. Moving that material into a stdlib repo seems inappropriate, such as descriptions of commands that are implemented in Coq, not in the stdlib.

I expect many (most?) of those changes are not necessary to put stdlib into its own repo. I didn't try to review all the doc changes line by line--I think that might overwhelm the github review mechanism and be distracting to those who are not so interested in documentation. At a minimum, the doc changes should be a separate PR to facilitate review, and there's no need to do that immediately. Frankly, though, documentation is a secondary concern.

Maintenance My impression is that there's a desire to make sweeping changes in how stdlib is organized, many of which may require changes to libraries and projects depending on stdlib. At this time we don't know the specific changes and how extensive the they will be. Nor do we know how long it will take each dependent project to adapt to the changes (both calendar time and developer time).

Future reviews of changes to stdlib need to consider this maintenance burden carefully as well as considering who will update the dependent projects (and whether the projects will get updated at all). Will the stdlib repo run the same set of CI tests as Coq? Probably we need something like the overlays we have to adapt CI projects, probably often created by the stdlib maintainers.

Coq platform Over time, different projects may require different versions of stdlib, which may be incompatible. It seems to me that this will make it much more difficult to release new versions of Coq platform. I would like to hear @MSoegtropIMC's thoughts on this proposal.

User/developer experience I would like to understand how the following use cases will work (and see working code in the PR that addresses any weak points, which I would want to test before the PR is merged):

  • Running coqc, coqide, vsCoq2, etc: requiring the user to specify -R <directory> Stdlib on the command line seems awkward. How do users discover what the directory should be?
  • Dune commands: does dune exec -- coqide and other commands need to have -R ... on it? Seems awkward.
  • If you modify Coq source and recompile, currently the .vo files in the library include a checksum of the Coq source. There should still be an easy way to compile both with one command.
  • Presumably opam would be the preferred way to install stdlib if you don't plan to modify its source, but it should be downloaded elsewhere with git if you want to modify stdlib. (Users should not directly modify opam directories, right?) How do you switch this? For repos you may modify locally, will you need to sync each git repo separately?
  • Deprecations: currently these are associated with Coq releases. Probably there will be deprecations related to stdlib versions, so some adjustment may be needed.
  • CoqIDE (and VsCoq2?) let you call up the documentation from the menu. How does this work when there are multiple sphinx documents?

User perceptions IMO users are more willing to make changes that are well-justified, i.e. they fix problems or make new things possible. Renaming things to make them more "logical" may not always be appreciated (not sure if much renaming is planned). Something stdlib maintainers should keep in mind.

Reversibility If creating a separate stdlib repo is indeed tentative, the PR should be made readily reversible. If it is not readily reversible, then that calls for more caution and more thorough review. It would help a lot to have a short summary of the main changes--sometimes it's hard to infer what they are from the code. Also splitting this into multiple PRs would facilitate review.

Final thoughts Including multiple versions of stdlib in a single Coq Platform seems like a recipe for trouble. And the problems won't show up for months. Also I'm concerned about the increased maintenance burden. I'm a bit more doubtful about the concept now than before. My other points are more tactical, i.e. about how stdlib is split off rather than whether the concept is good.

@andres-erbsen
Copy link

I added "What's cooking with stdlib?" to Tuesday's Coq Call. I am also curious about a lot of the same questions as Jim raises, and more generally. I would especially like to hear about what has already been developed and what bigger picture this work was driven by.

@proux01
Copy link
Author

proux01 commented Sep 23, 2024

The call has been moved to October 8th due to unavailability of a large part of the team coq/coq#19562 (comment)
In any case, nothing is "cooking with" the stdlib, the PR coq/coq#19530 is only the implementation of what has been agreed months ago (the time it took me to find enough time to devote to the implementation effort) following the discussion here and in related calls. But we can certainly discuss the implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.