Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Reserved prefixes in the 2021 edition #3101

Merged
merged 8 commits into from
May 6, 2021

Conversation

bstrie
Copy link
Contributor

@bstrie bstrie commented Apr 1, 2021

Beginning with the 2021 edition, reserve the syntax ident#foo, ident"foo", ident'f', and ident#123, as a way of future-proofing against future language changes.

Rendered

@bstrie bstrie mentioned this pull request Apr 1, 2021
@scottmcm scottmcm added T-lang Relevant to the language team, which will review and decide on the RFC. I-nominated labels Apr 1, 2021
bstrie and others added 3 commits April 1, 2021 19:56
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@m-ou-se
Copy link
Member

m-ou-se commented Apr 2, 2021

This looks great. :)

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the idea; we also discussed this at our most recent @rust-lang/lang meeting and folks were in favor then.

Left a few nits.

text/0000-reserved_prefixes.md Outdated Show resolved Hide resolved
text/0000-reserved_prefixes.md Outdated Show resolved Hide resolved
@programmerjake
Copy link
Member

how about also adding ident#number, and ident#'char'? seems like it might be useful for future user-defined literals

@bstrie
Copy link
Contributor Author

bstrie commented Apr 3, 2021

Ah yes, I totally forgot about char literals; if we go through with this for string literals, it seems to make sense to go through with it for chars as well. And I'd propose to reserve ident'foo', for symmetry with b'foo'; no need for the intervening # in this case. For integers we would still need the #; I only didn't include numeric literals out of some desire not to increase the scope without cause (at least for string literals, there are pre-proposals in mind). That said, it would be precedented from other languages; e.g. Ada has foo#num# where foo is the radix of the literal.

@nikomatsakis
Copy link
Contributor

I feel like reserving character literals and numbers seems reasonable.

@bstrie
Copy link
Contributor Author

bstrie commented Apr 6, 2021

I've pushed changes based on the feedback here and on Zulip. Highlights:

  1. Reserve prefixes on char literals and numeric literals as well.
  2. The use of an unknown prefix is now a pre-tokenization error, rather than a post-tokenization error.

@programmerjake
Copy link
Member

I think ident#"... and ident#'... where the ... is any arbitrary text (and possibly ident'ident and ident'number) should also be reserved (except we should keep raw strings and other stuff that's already defined, of course).

@bstrie
Copy link
Contributor Author

bstrie commented Apr 6, 2021

Can you explain in more detail what you'd like to reserve and why?

@programmerjake
Copy link
Member

programmerjake commented Apr 6, 2021

ident#"... would be reserved so we could add raw string variants of whatever new strings may be added in the future, e.g. fr#"raw format string"# could be shorthand for format_args!(r#"raw format string"#).
similarly for ident#'....

now that I'm typing this, I realized we should also add ident##... to support fr##"my f-string"##.

Though maybe we should just reserve everything of the form ident#... -- that makes the rule for # much easier to explain and implement.

All of the above doesn't mean discarding reservations for things without # though...

Obviously the pre-existing syntax (such as r#async or r#"abc"#) is still valid.

@bstrie
Copy link
Contributor Author

bstrie commented Apr 6, 2021

@programmerjake , note that the RFC as written does indeed reserve this syntax for raw string literals (for all possible quantities of surrounding #s); see the tokenizing rule for RESERVED_STRING_LITERAL in the reference-level explanation.

WRT "reserve everything of the form ident#...", what do you mean by "everything"? Every single production in the language, e.g. attributes ident##[foo], blocks ident#{...}, lifetimes ident#'foo, etc? Just the things referred to by this proposal? I think the former would take a large amount of justification, and for the latter I would say it is at odds with my perception of this RFC; IMO the important thing is the identifier prefix, and the # only exists where necessary to disambiguate the prefix from the following item. E.g., I don't see any scenario where it would be useful to be able to write b"foo" as b#foo"; not even macros (which historically have motivated a lot of the ""useless"" flexibility in the grammar) need this sort of generic flexibility, AFAICT.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp merge

In our 2021-04-06 meeting, we decided that we'd like to move forward with this RFC.

There were also a few other notes with respect to the right time to report an error and so forth (we felt reporting the error during tokenization time would be better, gives us maximal choices later).

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 9, 2021

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Apr 9, 2021
@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Apr 20, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 20, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@joshtriplett
Copy link
Member

Thanks for the proposal, @nikomatsakis, and thanks for the update, @bstrie!

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Apr 30, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 30, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@m-ou-se
Copy link
Member

m-ou-se commented May 5, 2021

Since this makes the lexer edition-dependent, what will this mean for proc_macro::TokenStream::from_str? That takes a string without any span/edition info. Does it use the edition of the proc macro crate? Or of the crate the proc macro was invoked from? (And if so, what does that mean if it was invoked by a macro_rules macro from an edition 2018 crate which was in turn invoked from a 2021 crate?) Or does it always use the latest edition?

@programmerjake
Copy link
Member

Since this makes the lexer edition-dependent, what will this mean for proc_macro::TokenStream::from_str? That takes a string without any span/edition info. Does it use the edition of the proc macro crate? Or of the crate the proc macro was invoked from? (And if so, what does that mean if it was invoked by a macro_rules macro from an edition 2018 crate which was in turn invoked from a 2021 crate?) Or does it always use the latest edition?

Maybe deprecate it and add a new function that takes an edition parameter?

@nikomatsakis
Copy link
Contributor

That's a good question, @m-ou-se =)

I'd like to know a bit more about how that function is used in practice. It seems clear that the ideal would be to use the edition of the crate from which that string originated, but without any span, we can't know that.

We could indeed deprecate in favor of an API that takes span information-- I'm not overly familiar with that crate, does such an API already exist?

I was planning on merging this RFC, and I think I will do so, but I'll add this to the list of unresolved question and open an issue for follow-up.

@nikomatsakis nikomatsakis merged commit 4458001 into rust-lang:master May 6, 2021
@nikomatsakis
Copy link
Contributor

Huzzah! The @rust-lang/lang team has decided to accept this RFC. Please follow along on the tracking issue rust-lang/rust#84978.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants