Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What exactly token streams are passed to procedural macros 1.2 #50038

Closed
petrochenkov opened this issue Apr 18, 2018 · 17 comments
Closed

What exactly token streams are passed to procedural macros 1.2 #50038

petrochenkov opened this issue Apr 18, 2018 · 17 comments
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 18, 2018

This is an issue that needs to be resolved before stabilization of "Macros 1.2".

Procedural macros that we are going to stabilize currently have two flavors - proc_macro and proc_macro_attribute.


proc_macro macros have signature fn(TokenStream) -> TokenStream and can be invoked with "bang" forms like this:

my::proc::macro!( TOKEN_STREAM )
my::proc::macro![ TOKEN_STREAM ]
my::proc::macro! { TOKEN_STREAM }

Only the TOKEN_STREAM part is passed to the macro as TokenStream, the delimiters (brackets) are NOT passed.

Why this is bad:

  • The macro doesn't know what delimiters it was invoked with.
    It was a part of Macro 2.0 promise to give macros control over delimiters in their invocations, so e.g. vec-like macros could require square brackets like vec![1, 2, 3] and reject other brackets.
    We should not prevent this kind of control being implemented in the future.

Why this is good:

  • Brackets are mostly not a part of the "useful payload" for the macro, they are there so macro invocations could be parsed unambiguously in many context in which they can appear - expressions, types, blocks, modules, etc, etc, etc.

proc_macro_attribute macros have signature fn(TokenStream, TokenStream) -> TokenStream and can be invoked with "attribute" forms like this:

#[my::proc::macro TOKEN_STREAM] TARGET
#![my::proc::macro TOKEN_STREAM] TERGET

TARGET is a trait/impl/foreign item, or a statement and it's passed to the macro as the second TokenStream argument, but we are not interested in it right now.

The TOKEN_STREAM part is passed to the macro as the first TokenStream argument, nothing is ignored.

Why this is bad:

  • It's not clear where the path ends and where the token stream starts.
    Something like #[a::b :: + -] seems to match the grammar, but is rejected right now because paths always parsed greedily so :: is interpreted as a path separator rather than a path of the token stream.
    Annoying questions arise with generic arguments in paths like #[a<>::b::c<u8>]. Technically this is a syntactically valid path and c having type arguments is rather a semantic error and the empty <> after the module a is not an error at all, but rigth now this attribute is interpreted as #[a /* <- PATH | TOKEN_STREAM -> */ <>::b::c<u8>].
    Ideally we'd like to avoid these questions completely and have an unambiguous delimiter.
  • It's not clear where the token stream ends.
    With plain #[attr TOKEN_STREAM] it's pretty clear - the stream ends before the ] (in this sense the situation is simpler than with bang macros), but things start breaking when other macros appear.
    macro m($meta1: meta, $meta2: meta) { ... }
    
    // No way to determine where the first attribute starts and the second attribute ends
    m!( a::b::c x , y , z , d::e::f u , v , w )
    So with this attribute syntax we can't support meta anymore!
  • It's not consistent with proc_macro macros. m!(a, b, c) does not include parentheses into the token stream, but #[m(a, b, c)] does.
  • I'm not actually sure people intend to stabilize this attribute syntax suddenly expanded from traditional forms (#[attr], #[attr(list)], #[attr = literal]) to being nearly unlimited (i.e. something like #[a::b::c e f + c ,,, ;_:] being legal) right now.

Proposed solution:

  • Stabilize proc_macro as is for "Macros 1.2".

  • In the future extend the set of proc_macro plugin interfaces with one more signature fn(TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations.

  • In the future possibly support bang macro invocations without delimiters for symmetry with attributes and because they may be legitimately useful (let x = MACRO_CONST!;, see https://internals.rust-lang.org/t/idea-elide-parens-brackets-on-unparametrized-macros/6527) (the Delimiter argument is Delimiter::None in this case).

  • Restrict attribute syntax accepted by proc_macro_attribute for "Macros 1.2" to

    // Symmetric with bang macro invocations
    #[my::proc::macro(TOKEN_STREAM)]
    #[my::proc::macro[TOKEN_STREAM]]
    #[my::proc::macro { TOKEN_STREAM }]
    // Additionally
    #[my::proc::macro]
    #[my::proc::macro = TOKEN_TREE]

    Or, more radically, do not stabilize the = syntax for procedural macros 1.2.
    This is not a fundamental restriction - arbitrary token streams still can be placed inside the brackets (#[a::b::c(e f + c ,,, ;_:)]).

  • The token stream passed to the macro DOES NOT include the delimiters.

  • In the future extend the set of proc_macro_attribute plugin interfaces with one more signature fn(TokenStream, TokenStream, Delimiter) -> TokenStream that allows controlling delimiters used in macro invocations (the delimiter is Delimiter::None for both #[attr] and #[attr = tt] forms but they are still discernable by the token stream being empty or not).

@petrochenkov
Copy link
Contributor Author

@petrochenkov
Copy link
Contributor Author

In the future extend the set of ... plugin interfaces with one more signature ... that allows controlling delimiters used in macro invocations

Alternatives:

  • Change the signatures for proc_macro and proc_macro_attribute to include Delimiter before stabilization.
  • Do not change signatures, include delimiters into the token stream for proc_macro before stabilization.

@kennytm kennytm added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-macros-2.0 Area: Declarative macros 2.0 (#39412) labels Apr 18, 2018
@abonander
Copy link
Contributor

I don't know if this deserves its own issue, but I think there should be a way for a proc_macro_attribute to ask what kind of AST node (crate, item, statement, or expression) its input represents.

  • attributes that only accept one kind could assert equality and error immediately instead of attempting to parse their expected kind
  • attributes that accept multiple kinds won't have to guess at what node kind they should attempt to parse

@abonander
Copy link
Contributor

Actually #47786 is probably a better place to suggest that one.

@alexcrichton
Copy link
Member

Thanks for bringing these issues up @petrochenkov! Your proposed solutions sounds pretty good to me, but I wanted to clarify a point or two as well.

For #[proc_macro] I'd be fine either requiring Delimiter today or adding a second signature down the road. I think I'd slightly prefer to have both options in the long run as most authors probably won't mind too much about what Delimiter is used, so I'd probably err on the side of leaving it as-is and possibly adding support for a new signature later on.

For #[proc_macro_attribute] I think it's a great idea to limit the syntax you can possibly work with today. The whitelisted syntaxes you proposed above sound good to me, and do you also think we should limit paths to just one element? (aka disallow #[foo::bar]).

I wanted to clarify, though, are you thinking the delimiter is dropped from the token stream going into #[proc_macro_attribute] as well? If we do that I think we would be required to stabilize and only support a signature that takes a Delimiter (to differentiate #[foo] and #[foo()]). I agree though that in these worlds removing the #[foo = bar] custom attribute is probably the best, and I don't think it'd be too hard to come up with alternate syntaxes for users today doing things like #[foo(baz = bar)].

@abonander
Copy link
Contributor

@alexcrichton Absolute paths in attributes allow them to work at the crate root where they otherwise won't resolve due to scoping rules (#41430, attributes resolve in the parent module but the crate root has no parent). So unless we want to change the inner attribute form to resolve in the current module instead of the parent, absolute paths are the only way to call attributes at the crate root.

@alexcrichton
Copy link
Member

@abonander ah true yeah, but the first pass of stabilization of Macros 1.2 won't stabilize attributes on modules (or crates), only bare items like functions, structs, impls, traits, etc.

@abonander
Copy link
Contributor

abonander commented Apr 18, 2018

@alexcrichton we're not currently feature gating attribute invocations on modules or at the crate root so that needs to be its own issue. It would be a bit more complex as we'd have to wait until the attribute resolves to a #[proc_macro_attribute] before emitting a feature gate error.

@alexcrichton
Copy link
Member

Oh sure yeah when I say only allow one element that's just for now, we'd still, I'd imagine, allow absolute paths and more-than-one-element paths behind a feature gate.

@abonander
Copy link
Contributor

Absolute paths in attributes are already feature gated, actually. Would #[feature(proc_macro)] just imply that feature gate like it does now with use_extern_macros?

@alexcrichton
Copy link
Member

Perhaps yeah, I might be more of a fan of finer-grained feature gates after the next round of stabilization, but either way is fine.

@petrochenkov
Copy link
Contributor Author

@alexcrichton

I think I'd slightly prefer to have both options in the long run as most authors probably won't mind too much about what Delimiter is used, so I'd probably err on the side of leaving it as-is and possibly adding support for a new signature later on.

Yeah, I'm not sure what is better too and tend to leave things as is for now and introduce a separate signature later.

do you also think we should limit paths to just one element? (aka disallow #[foo::bar]).

Yes (#35896 (comment)), but that falls more under the "macro modularisation" issue, so I didn't mention it again.
(If by limiting you mean not stabilizing multi-segment paths rather than "unimplementing" them).

I wanted to clarify, though, are you thinking the delimiter is dropped from the token stream going into #[proc_macro_attribute] as well?

Yes.

If we do that I think we would be required to stabilize and only support a signature that takes a Delimiter (to differentiate #[foo] and #[foo()]).

Differentiating between #[foo] and #[foo()] is equivalent to differentiating between foo!() and foo![], so I think we can certainly live without it and it's not required to introduce the signature with Delimiter immediately.
But if this differentiation is seemed sufficiently important, then we should implement/stabilize the Delimiter signature sooner rather than later for both proc_macro and proc_macro_attribute.

@petrochenkov
Copy link
Contributor Author

a signature that takes a Delimiter

One more alternative is to keep the delimiter in CURRENT_SESS and extract it from there on demand like we do, for example, with Span::call_site.

@alexcrichton
Copy link
Member

Ok that sounds pretty compelling to me! I thinks it's definitely clear that one work item here is:

  • Add a new feature gate (not proc_macro) for macro invocations that have more than one path element

When I was thinking that we'd require the Delimiter that was mostly to be forward compatible for if we ever allow #[foo = bar], but I think we can basically take the route of having a global Delimiter or adding a new signature/attribute for that situation. In light of that I think there's another clear work item here:

  • Add a new feature gate (not proc_macro) for attribute invocations not of the form #[path ( .. )]
  • Change the input TokenStream to an attribute to remove the delimiters from the token stream.

@petrochenkov do you think there's more work items though we need to close this out?

@petrochenkov
Copy link
Contributor Author

@alexcrichton

do you think there's more work items though we need to close this out?

No, the listed three items seem to cover everything.

Except that I planned to outright prohibit attribute syntaxes not matching 5 forms listed in "Proposed solution" in #50038 (comment), some of those 5 forms can be kept unstable on top of that though.

@alexcrichton
Copy link
Member

Except that I planned to outright prohibit attribute syntaxes not matching 5 forms listed

Sounds fine by me!

@alexcrichton
Copy link
Member

I'm working on these changes and I should have a PR to post soon

alexcrichton added a commit to alexcrichton/rust that referenced this issue Apr 21, 2018
This commit starts to lay some groundwork for the stabilization of custom
attribute invocations and general procedural macros. It applies a number of
changes discussed on [internals] as well as a [recent issue][issue], namely:

* The path used to specify a custom attribute must be of length one and cannot
  be a global path. This'll help future-proof us against any ambiguities and
  give us more time to settle the precise syntax. In the meantime though a bare
  identifier can be used and imported to invoke a custom attribute macro. A new
  feature gate, `proc_macro_path_invoc`, was added to gate multi-segment paths
  and absolute paths.

* The set of items which can be annotated by a custom procedural attribute has
  been restricted. Statements, expressions, and modules are disallowed behind
  two new feature gates: `proc_macro_expr` and `proc_macro_mod`.

* The input to procedural macro attributes has been restricted and adjusted.
  Today an invocation like `#[foo(bar)]` will receive `(bar)` as the input token
  stream, but after this PR it will only receive `bar` (the delimiters were
  removed). Invocations like `#[foo]` are still allowed and will be invoked in
  the same way as `#[foo()]`. This is a **breaking change** for all nightly
  users as the syntax coming in to procedural macros will be tweaked slightly.

* Procedural macros (`foo!()` style) can only be expanded to item-like items by
  default. A separate feature gate, `proc_macro_non_items`, is required to
  expand to items like expressions, statements, etc.

Closes rust-lang#50038

[internals]: https://internals.rust-lang.org/t/help-stabilize-a-subset-of-macros-2-0/7252
[issue]: rust-lang#50038
bors added a commit that referenced this issue Apr 21, 2018
…nkov

rustc: Tweak custom attribute capabilities

This commit starts to lay some groundwork for the stabilization of custom
attribute invocations and general procedural macros. It applies a number of
changes discussed on [internals] as well as a [recent issue][issue], namely:

* The path used to specify a custom attribute must be of length one and cannot
  be a global path. This'll help future-proof us against any ambiguities and
  give us more time to settle the precise syntax. In the meantime though a bare
  identifier can be used and imported to invoke a custom attribute macro. A new
  feature gate, `proc_macro_path_invoc`, was added to gate multi-segment paths
  and absolute paths.

* The set of items which can be annotated by a custom procedural attribute has
  been restricted. Statements, expressions, and modules are disallowed behind
  two new feature gates: `proc_macro_expr` and `proc_macro_mod`.

* The input to procedural macro attributes has been restricted and adjusted.
  Today an invocation like `#[foo(bar)]` will receive `(bar)` as the input token
  stream, but after this PR it will only receive `bar` (the delimiters were
  removed). Invocations like `#[foo]` are still allowed and will be invoked in
  the same way as `#[foo()]`. This is a **breaking change** for all nightly
  users as the syntax coming in to procedural macros will be tweaked slightly.

* Procedural macros (`foo!()` style) can only be expanded to item-like items by
  default. A separate feature gate, `proc_macro_non_items`, is required to
  expand to items like expressions, statements, etc.

Closes #50038

[internals]: https://internals.rust-lang.org/t/help-stabilize-a-subset-of-macros-2-0/7252
[issue]: #50038
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants