Reserved Prefixes RFC: How to handle `TokenStream::from_str`? #84979

nikomatsakis · 2021-05-06T10:07:23Z

@m-ou-se raised a point on rust-lang/rfcs#3101 that needs to be resolved in some way:

Since this makes the lexer edition-dependent, what will this mean for proc_macro::TokenStream::from_str? That takes a string without any span/edition info. Does it use the edition of the proc macro crate? Or of the crate the proc macro was invoked from? (And if so, what does that mean if it was invoked by a macro_rules macro from an edition 2018 crate which was in turn invoked from a 2021 crate?) Or does it always use the latest edition?

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2021-05-06T10:07:39Z

cc @dtolnay @petrochenkov @rust-lang/lang

m-ou-se · 2021-05-06T10:53:58Z

Note that we're about to add Literal::from_str. I personally don't think deprecating TokenStream::from_str (as suggested here and here) is the right way to go. But if we do want to do that, we should probably not add this new trait implementation, since it has the same problem.

m-ou-se · 2021-05-06T10:56:49Z

@nikomatsakis said in rust-lang/rfcs#3101 (comment):

I'd like to know a bit more about how that function is used in practice.

I think @dtolnay can answer that question. quote is probably one of the main users of this function.

nikomatsakis · 2021-05-06T14:05:50Z

So, a few thoughts:

The ideal would be to know the span that the literal came from, do we all agree on that? It's just that we don't know it.
I don't think using the edition of some crate (e.g., the procedural macro, or the invoker thereof) makes much sense, since it's really about the crate which the tokens came from, but regardless I don't know how we would even do that. There isn't really a mechanism for the from_str function to test the edition of the crate that is invoking it.
The main alternative I can see is to fix the edition or to say "always uses the latest edition". But either of those seem like they would lead to deprecation to me.

petrochenkov · 2021-05-06T14:26:13Z

Isn't the prefix reservation supposed to be edition-independent based on the crater results?

I hope that's true because the lexer library (the one shared with rust-analyzer) doesn't currently have any knowledge about editions, it also isn't supposed to have access to any global data (including "global edition") by design.

nikomatsakis · 2021-05-06T14:31:46Z

@petrochenkov The generalized form of the RFC was not edition independent, as far as I know, and included things like prefixed literals (f"foo") etc. I'm not sure how those currently lex with the llbrary.

Mark-Simulacrum · 2021-05-18T17:37:18Z

@matklad @petrochenkov We discussed this a little in the T-lang meeting on 2021-05-18, and were interested in getting a bit more information about what the specific implications of introducing edition-dependent lexing are, and the extent to which this is a large problem. There's definitely a tradeoff here between simplicity (just reserve across all editions or don't do anything) and the relative desire to introduce new lexing rules. We're interested in your opinions on the downsides so that we can better evaluate that tradeoff.

matklad · 2021-05-19T09:34:23Z

I wouldn't worry about rustc_lexer interface, I think it can support edition-dependent lexing in a rather clean way:

// rustc_lexer
pub struct LexerConfig {
    allow_string_prefixes: bool,
}

/// Parses the first token from the provided input string.
pub fn first_token(config: &LexerConfig, input: &str) -> Token {
    debug_assert!(!input.is_empty());
    Cursor::new(input).advance_token()
}

// Call-site in rustc/rust-analyzer
let lexer_config = if edition >= 2021 {
    LexerConfig { allow_string_prefixes: true }
} else {
    LexerConfig { allow_string_prefixes: false }
}
first_token(&lexer_config, token_text);

I do worry about overall "erosion" of the nice token tree model. I think the original design was that token trees are a narrow public interface for macros. Tokens are simple, so we can just expose them and never need to redesign, right? Since then, we tweaked the design a number of times:

for proc macros, we added jointness.
we taught the parser to split t.0.0 into 0, ., 0.
there's desire to expose the underlying token text for proc macros that are interested in whitespace
we want to make tokenization dependent on editions

All of the above would be totally fine if tokens were a private impl detail of a particular compiler, but they are a part of public API. I can't say what problem exactly this creates (maybe it doesn't?), but I have a feeling of un-elegantness.

matklad · 2021-05-19T09:37:01Z

I think there was a plan for suffixes/prefixes was to expose jointness not only for punctuation, but also for idents? Such that f"" is lexed exactly as it is today, but the parser glues f and " together? Are we considering this alternative?

matklad · 2021-05-19T09:44:04Z

Are there similar issues, besides prefixes, which might require changes to the lexer? One thing that comes to mind is that we want to have let s: String = f"ans = {2 + 2}!".to_string() or some such as an end-goal. Most natural implementation of that (as found, in, eg, Kotlin) requires that the lexer produces many tokens for f"ans = {2 + 2}!" fragment: f"ans ={, 2, +, 2, }!". Are there other things which might require changes to the lexing?

m-ou-se · 2021-05-31T11:43:48Z

We specifically wanted to make these things a lexer error, so we don't have to decide on the lexing rules for that literal yet. E.g. is fb"\xFF" valid or not? How about fb"❉"? Or fr"\"? Or f"hello {123 + "hey".len()}"?

nikomatsakis mentioned this issue May 6, 2021

Tracking Issue for RFC 3101: Reserved Prefixes #84978

Closed

3 tasks

nikomatsakis added the F-reserved_prefixes `#![feature(reserved_prefixes)]` label May 6, 2021

crlf0710 mentioned this issue May 12, 2021

Tracking issue for inherent associated types #8995

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reserved Prefixes RFC: How to handle `TokenStream::from_str`? #84979

Reserved Prefixes RFC: How to handle `TokenStream::from_str`? #84979

nikomatsakis commented May 6, 2021

nikomatsakis commented May 6, 2021

m-ou-se commented May 6, 2021

m-ou-se commented May 6, 2021

nikomatsakis commented May 6, 2021

petrochenkov commented May 6, 2021

nikomatsakis commented May 6, 2021

Mark-Simulacrum commented May 18, 2021

matklad commented May 19, 2021 •

edited

Loading

matklad commented May 19, 2021 •

edited

Loading

matklad commented May 19, 2021

m-ou-se commented May 31, 2021

Reserved Prefixes RFC: How to handle TokenStream::from_str? #84979

Reserved Prefixes RFC: How to handle TokenStream::from_str? #84979

Comments

nikomatsakis commented May 6, 2021

nikomatsakis commented May 6, 2021

m-ou-se commented May 6, 2021

m-ou-se commented May 6, 2021

nikomatsakis commented May 6, 2021

petrochenkov commented May 6, 2021

nikomatsakis commented May 6, 2021

Mark-Simulacrum commented May 18, 2021

matklad commented May 19, 2021 • edited Loading

matklad commented May 19, 2021 • edited Loading

matklad commented May 19, 2021

m-ou-se commented May 31, 2021

Reserved Prefixes RFC: How to handle `TokenStream::from_str`? #84979

Reserved Prefixes RFC: How to handle `TokenStream::from_str`? #84979

matklad commented May 19, 2021 •

edited

Loading

matklad commented May 19, 2021 •

edited

Loading