Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a more flexible and accurate model for grammar levels and associativity #71

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions text/camlp5-associativity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
- Title: Towards a more flexible and accurate model for grammar levels and associativity

- Drivers: Hugo Herbelin

----

# Summary

The Camlp5/gramlib model for associativity of levels in extensible grammar has a couple of issues. We reflect how to solve them.

# Detailed description of the current problems

## Non associativity

Camlp5 let us believe that non associativity is possible and we let Coq users believe in turn that it was the case too. However, non associativity is *not* a feature of Camlp5. As said in the section [Associativity](https://camlp5.readthedocs.io/en/latest/grammars.html?highlight=associativity#extensible-grammars) of the Camlp5 manual, the keyword `NONA` behaves as `LEFTA`. And indeed, there is no mechanism to support nor emulate non associativity of infix or postfix operators in Camlp5.

Also, non associativity (equivalently right associativity) is inactive on postfix operators (e.g., a rule such as `[ RIGHTA [ t = SELF; "!" -> ... ] ]` accepts parsing `x!!`. This is for the same reason as non-associativity is not supported (one would need to call continuations only once and not arbitrarily recursively many often, see remark at end of camlp5/camlp5#100).

For another reason, non associativity (equivalently left associativity) is not emulable on prefix operators (e.g., a rule such as `[ LEFTA [ "?"; t = SELF -> ... ] ]`, or `[ LEFTA [ "?"; t = NEXT -> ... ] ]`, accepts parsing `??x`); this times it is because of a fallback which replace `SELF` by `SELF TOP` if `SELF` has failed.

Note that the non support for non associativity in the parser is currently a cause of asymmetry with the printer, since the printer respects non associativity.

## Each Camlp5 level is in practice two levels

Indeed, a Camlp5 level is made of two blocks:
- a block of rules of the form `SELF; suffix`
- the block of rules not of this form

In the first block, `SELF` is implemented by a call to the 2nd block of rules. As a consequence, if an infix notation is put at Coq's lower level 0, it still succeeds to parse its argument by finding closed subterms in the second block of level 0, such as an ident, or `( term )`.

This results in bug reports such as coq/coq#15336 where the parser parses the notation but the printer does not print it.

## Mixing associativity in the same level is technically possible and would solve the issue of combining developments that set different associativities to the same level

In practice, a level can mix different associativities. If a rule of the form `SELF; "+"; NEXT -> ...` is put in a right-associative level, it parses `+` in a left associative way. And if a rule of the form `SELF; "+"; SELF LEVEL "foo" -> ...` is put in the left-associative level `"foo"`, then, `+` is parsed in a right associative way. This is because the *only* effect of the `RIGHTA`/`LEFTA` keywords at some level, say `"foo"`, is to give a *default* interpretation to the occurrences of `SELF` on the right-hand side of rules (that is such `SELF` is then interpreted as `NEXT` for `LEFTA` and as `SELF LEVEL "foo"` for `RIGHTA`). In particular, we can decide the associativity per rule by just not using `SELF` on right-hand sides.

Note that in this case, rules with right associativity (i.e. ending with `SELF LEVEL "foo"`) yield precedence (i.e. bind wider) than rules with left associativity.

See e.g. coq/coq#17859 and camlp5/camlp5#100 for more details.

Note that an example of conflicts on setting different associativities to the same level is told to be when combining Iris and MathComp.

## Camlp5 associativity shown with `Print Grammar constr` leads to confusion

The `RIGHTA`/`LEFTA` status of a level is thus indicative, and not constraining. And, actually, Coq do override it by setting itself the right-hand side of rules to `NEXT` or `SELF LEVEL "foo"`.

When no official associativity is assigned to a level, Camlp5 decides it is left associative while the Coq table decides (confusingly) that it is right associative. Since the associativity is only a default value for rules which can be overriden per rule, one confusingly sees a level reported as left associative while Coq allows to add only right associative rules to the level (see coq/coq#12373).

Also, the current Camlp5 implementation does not treat associativity in the presence of `LIST1` and `LIST0` on the border of a rule. E.g., in `...; LIST1 SELF -> ...`, `SELF` is interpreted as `SELF TOP` and thus does not follow associativity. This is visible e.g. in the `"choice"; LIST1 rewstrategy` rule for `rewstrategy`.

## Camlp5/gramlib does not fully respect levels

When no rule matches, there is a recovery mechanism (see coq/coq#9008) made of 4 fallbacks that bypasses the level specified by the user.

For instance, `True \/ forall x, x = 0` is tolerated even though `forall` is at level 200 and `/\` at level 80. Or `x~0%Z` is tolerated even though `%Z` expects its arguments to be at level 1.

# Proposal

We propose to go towards a new model:
- that respects levels
- that respects no associativity
- is more flexible about mixing different associativities in the same level
- remains mostly compatible with the existing situation

# Solutions

## Removal of the recovery mechanism

Removing the recovery mechanism breaks quite common situations such as `True \/ forall x, x = 0`, or `2^-n`, or `x~0%Z`.

One could try to automatically adapt the scripts but I believe that the simplest is to move `forall` (and all `binder_constr` rules spanning until the end of the sentence) to a lower level, such as 10, just above the level of application.

The drawback is that we need to modify the printer so that it knows how to print such cycle made of a rule at some level (here 10) which is lower than the level on the right-hand side of the rule (here 200). This is however already made possible by PR #17875.

For `-`, there are several solutions, such as:
- to set it at level 10 with arguments at level 40 (the level of `*`) so that `2 ^ - x * y` is `2 ^ - (x * y)` but `2 ^ - x + y` is `(2 ^ - x) + y` (??)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing 2 ^ - x * y as 2 ^ (- (x * y)) seems to be quite surprising. I would expect the consensus among mathematicians to be (2 ^ (- x)) * y.

In other words, there is no reason for the argument of prefix "-" to be at level 40. It should be at the current level (i.e., 35 if the argument of infix "^". In fact, the notion of levels for a notation that starts with a terminal and ends with a non-terminal does not make sense in the first place. It should not have an intrinsic level, it should always use the ambient level.

Copy link
Member Author

@herbelin herbelin Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused by the idea that - x * y was - (x * y) by default in mathematics, but that's the same as (- x) * y, so it does not have to be the former. And actually, most programming languages seem to make prefix - binding tightly (so morally at 10 for Coq).

For the example I mentioned, the level of x in - x do not matter. The whole notation could be at 10 with x at 10 too.

What do you mean by "It should not have an intrinsic level, it should always use the ambient level.". That is, in the following examples, how would you associate (or fail):

  • x * - y
  • x ^ - y
  • x + - y
  • f x - y
  • - f * x
  • - f ^ x
  • - f + x
  • - f x

Adding a link: https://en.wikipedia.org/wiki/Order_of_operations#Unary_minus_sign

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, most programming languages seem to make prefix - binding tightly (so morally at 10 for Coq).

That does not mean that it is a good idea. In fact, the very first sentence of the Wikipedia page you cite says that - 3 ^ 2 is interpreted as -9 in mathematics. And the only "languages" it mentions as binding tightly are Excel, PlanMaker, and bc.

So, having unary minus at level 10 would be awful. And having it at level 35 is already painful. I hate having to write - (x * y). It should have been at level 45.

  • x * - y
  • x ^ - y
  • x + - y

All three of them associate as x ! (- y).

  • f x - y

Do you have an infix minus in the context? If not, then parsing can be recovered as f x (- y).

But the interesting case is actually f x - y z, which would be parsed as f x (- y) z (assuming again that there is no infix minus, otherwise the point is moot).

  • - f * x
  • - f ^ x
  • - f + x
  • - f x

You did not say what the ambient level is, so it depends. Let us assume that those expressions all look like y + - f ! x (so, ambient level is 50 left associative). Then they would respectively parse as

  • y + (- (f * x))
  • y + (- (f ^ x))
  • (y + (- f)) + x
  • y + (- (f x))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm tired, the first 3 questions were pointless. I meant the minus sign in the front)

I don't see what conclusion to draw otherwise. And we are somehow constrained by what is currently done, which is:

Require Import ZArith. Context (x y :Z) (f : Z -> Z) (g : Z). Set Printing Parentheses. Open Scope Z_scope.
Check  x * - f y. (* x * (- (f y)) *) (* due to recovery *)
Check  x ^ - f y. (* x ^ (- (f y)) *) (* due to recovery *)
Check  x + - f y. (* x + (- (f  y) *) (* due to recovery *)
Check  f x - y.   (* (f  x) - y *)
Check  - g * x.   (* (- g) * x *)
Check  - g ^ x.   (* - (g ^  x) *)
Check  - g + x.   (* (- g) ^  x *)
Check  - f x.     (* - (f x) *)
Check  - x * y.   (* (- x) * y *)
Check  - x ^ y.   (* - (x ^ y) *)
Check  - x + y.   (* (- x) + y *)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, experimenting changing associativity of:

  • - x * y from (- x) * y to - (x * y) and
  • - x / y from (- x) / y to - (x / y)

requires several changes in stdlib, starting with:

  • Ring_theory.Ropp_mul_l : -(x * y) == -x * y (also in Ncring.v) and Ring_theory.ARopp_mul_l : -(x * y) == (-x) * y
  • BinInt.Zopp_mult_distr_l : - (n * m) = - n * m
  • Zdivfloor.div_opp_opp : b~=0 -> -a/-b == a/b
  • Field_theory.rdiv5 : - (a / b) == - a / b
  • Uint63.opp_spec : φ (- x) = - φ x mod wB
  • -a÷b in Zquote.v
  • - Qnum x * Zpos (Qden y) in QArith_base.v
  • non exhaustive list...

- to change scripts and add parentheses

For `x~0%Z`, there are hopefully few occurrences.

## Granting non associativity

In our experience, the changes are relatively small in the standard library:
- it impacts the family of notations `=`, `=?`, `<`, `<?`, etc. at level 70
- one solution is to add parentheses in a couple of expressions of the form `x =? y = z` (namely so that it is `(x =? y) = z`)
- other solutions would be to move `=?` to a lower level (and higher precedence) but I did not investigate the impact
- it impacts `mod`, currently set non associative at the left-associative level 40 and this would either require to move it to another level, or to set it left-associative, or to accept different associativities in the same level

## Fixing the associativity bugs and supporting different associativities in the same level

There are basically two directions:
- to renounce to Camlp5 support for levels, that is, to manage manually the inclusion of levels and the associativity of operators using distinct entries
- to fix the problems of levels and associativity in Camlp5

The first direction would a priori require more work and more support of Coq to manage itself the levels and associativity. On the other side, an advantage is that it would then probably be easier to experiment with different parsing engines.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that it does not get lost, here is an LL(1) grammar I posted on Zulip that supports having multiple (non) associative operators at the same level, but causes a parsing error if the user tries to mix them in ambiguous way in a single expression, e.g., x opR y opL z. (L, N, R respectively stand for left-, non-, right-associative.)

term42:
  [ x = term41; y = term42_no_left -> y x
  | x = term41; y = term42_no_right -> y x
  | x = term41; "opN"; y = term41 -> OpN(x,y)
  | x = term41 -> x ]
term42_no_left:
  [ "opR";  x' = term41; y = term42_no_left -> fun x => OpR (x, y x') ]
term42_no_right:
  [ "opL";  x' = term41; y = term42_no_right -> fun x => y (opL  (x, x'))
  | "opL'"; x' = term41; y = term42_no_right -> fun x => y (opL' (x, x')) ]


The second direction would have the advantage to be benefital to all users of Camlp5 in general. Moreover, I already implemented some pieces of it, so that would be more satisfactory for me.

For the second direction, what needs to be done is:
- to support deactivating the recovery mechanism: this is done in coq/coq#17876
- to support non-associativity: this is done in coq/coq#17876, and I roughly see how to extend it with a model of per-rule associativity
- to fix a small asymmetry in the definition of `NEXT` level in Camlp5 (see camlp5/camlp5#100)
- to support per-rule associativity: I already have a branch started for it

Altogether, in the resulting model, each Camlp5 level is made of five levels of increasing precedences in this "natural" order:
- the level of right-associative infix operators (starting with `SELF` and ending with `SELF`)
- the common level of left-associative infix and postfix operators (starting with `SELF` and with repeatable continuation)
- the common level of non-associative infix and postfix operators (starting with `SELF` and with a non-repeatable continuation)
- the level of right-associative prefix operators (starting with a terminal and ending with `SELF`)
- the common level of closed expressions and non-associative prefix operators (starting with a terminal and w/o recursivity on the right)

even if other orders can be considered too.

All in all, the proposal would:
- Solve printing bugs: coq/coq#15336, via the modified coq/coq#15341
- Solve oddities: recovery mechanism, coq/coq#12373 and coq/coq#15781
- Support combining theories with incompatible associativity at the same level
- Provide non associativity

## Appendix

### A summary of inconsistencies between the parser and printer

- when the underlying grammar is not stratified as in the case of `rewstrategy` that PR coq/coq#17832 fixes, or if `binder_constr` or `-` were moved at lower levels
- when a rule is declared non-associative (since the parser does not respect non-associativity)
- when an infix is put at level 0 (see coq/coq#15536) or more generally when a left-associative rule refers to the non-self-starting subset of a rule

### A summary of changes to be done

Regarding PRs:
- renounce to unmerged [#15341](https://github.com/coq/coq/pull/15341) that is, do not forbid infix operators at level 0 and inform instead the printer that a level is made of sublevels
- renounce to unmerged [#17126](https://github.com/coq/coq/pull/17126), that is, no need any more to move to level 1 the prefix/postfix notations at level 0

In the Coq wrapper to Camlp5:
- stop forbidding to declare at some level an operator whose associativity is different from the one already registered for a level
- continue implementing associativity of operators by using either `NEXT` or `SELF LEVEL foo` without further qualms of not relying on the default associativity mechanism (or better: implement per-rule associativity in Camlp5)
- continue accepting that a level host rules referring to an arbitrary higher level on its right-hand side (to support `binder_constr` or `-` at level 10)

In the mlg files and their reflection in `egramcoq.ml`:
- make the levels in `egramcoq.ml` consistent with `g_constr.mlg` (for `term` at levels `0`, `8` and `9`), so that if a notation open on both sides and without explicit associativity at level 8 or 9 works at it does currently

In the Coq reference manual:
- make explicit that different associativities can coexist in the same level
- make explicit that the associativity of a level is only used as a default
- make clearer that the associativity written by `Print Grammar` is only to assign a default associativity for rules mentioning `SELF` on their right-hand side
- document that `ARGUMENT EXTEND` is in practice set to `LEFTA`