Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserved Prefixes RFC: How to handle TokenStream::from_str? #84979

Open
nikomatsakis opened this issue May 6, 2021 · 11 comments
Open

Reserved Prefixes RFC: How to handle TokenStream::from_str? #84979

nikomatsakis opened this issue May 6, 2021 · 11 comments
Labels
F-reserved_prefixes `#![feature(reserved_prefixes)]`

Comments

@nikomatsakis
Copy link
Contributor

@m-ou-se raised a point on rust-lang/rfcs#3101 that needs to be resolved in some way:

Since this makes the lexer edition-dependent, what will this mean for proc_macro::TokenStream::from_str? That takes a string without any span/edition info. Does it use the edition of the proc macro crate? Or of the crate the proc macro was invoked from? (And if so, what does that mean if it was invoked by a macro_rules macro from an edition 2018 crate which was in turn invoked from a 2021 crate?) Or does it always use the latest edition?

@nikomatsakis
Copy link
Contributor Author

cc @dtolnay @petrochenkov @rust-lang/lang

@nikomatsakis nikomatsakis added the F-reserved_prefixes `#![feature(reserved_prefixes)]` label May 6, 2021
@m-ou-se
Copy link
Member

m-ou-se commented May 6, 2021

Note that we're about to add Literal::from_str. I personally don't think deprecating TokenStream::from_str (as suggested here and here) is the right way to go. But if we do want to do that, we should probably not add this new trait implementation, since it has the same problem.

@m-ou-se
Copy link
Member

m-ou-se commented May 6, 2021

@nikomatsakis said in rust-lang/rfcs#3101 (comment):

I'd like to know a bit more about how that function is used in practice.

I think @dtolnay can answer that question. quote is probably one of the main users of this function.

@nikomatsakis
Copy link
Contributor Author

So, a few thoughts:

  • The ideal would be to know the span that the literal came from, do we all agree on that? It's just that we don't know it.
  • I don't think using the edition of some crate (e.g., the procedural macro, or the invoker thereof) makes much sense, since it's really about the crate which the tokens came from, but regardless I don't know how we would even do that. There isn't really a mechanism for the from_str function to test the edition of the crate that is invoking it.
  • The main alternative I can see is to fix the edition or to say "always uses the latest edition". But either of those seem like they would lead to deprecation to me.

@petrochenkov
Copy link
Contributor

Isn't the prefix reservation supposed to be edition-independent based on the crater results?

I hope that's true because the lexer library (the one shared with rust-analyzer) doesn't currently have any knowledge about editions, it also isn't supposed to have access to any global data (including "global edition") by design.

@nikomatsakis
Copy link
Contributor Author

@petrochenkov The generalized form of the RFC was not edition independent, as far as I know, and included things like prefixed literals (f"foo") etc. I'm not sure how those currently lex with the llbrary.

@Mark-Simulacrum
Copy link
Member

@matklad @petrochenkov We discussed this a little in the T-lang meeting on 2021-05-18, and were interested in getting a bit more information about what the specific implications of introducing edition-dependent lexing are, and the extent to which this is a large problem. There's definitely a tradeoff here between simplicity (just reserve across all editions or don't do anything) and the relative desire to introduce new lexing rules. We're interested in your opinions on the downsides so that we can better evaluate that tradeoff.

@matklad
Copy link
Member

matklad commented May 19, 2021

I wouldn't worry about rustc_lexer interface, I think it can support edition-dependent lexing in a rather clean way:

// rustc_lexer
pub struct LexerConfig {
    allow_string_prefixes: bool,
}

/// Parses the first token from the provided input string.
pub fn first_token(config: &LexerConfig, input: &str) -> Token {
    debug_assert!(!input.is_empty());
    Cursor::new(input).advance_token()
}

// Call-site in rustc/rust-analyzer
let lexer_config = if edition >= 2021 {
    LexerConfig { allow_string_prefixes: true }
} else {
    LexerConfig { allow_string_prefixes: false }
}
first_token(&lexer_config, token_text);

I do worry about overall "erosion" of the nice token tree model. I think the original design was that token trees are a narrow public interface for macros. Tokens are simple, so we can just expose them and never need to redesign, right? Since then, we tweaked the design a number of times:

  • for proc macros, we added jointness.
  • we taught the parser to split t.0.0 into 0, ., 0.
  • there's desire to expose the underlying token text for proc macros that are interested in whitespace
  • we want to make tokenization dependent on editions

All of the above would be totally fine if tokens were a private impl detail of a particular compiler, but they are a part of public API. I can't say what problem exactly this creates (maybe it doesn't?), but I have a feeling of un-elegantness.

@matklad
Copy link
Member

matklad commented May 19, 2021

I think there was a plan for suffixes/prefixes was to expose jointness not only for punctuation, but also for idents? Such that f"" is lexed exactly as it is today, but the parser glues f and " together? Are we considering this alternative?

@matklad
Copy link
Member

matklad commented May 19, 2021

Are there similar issues, besides prefixes, which might require changes to the lexer? One thing that comes to mind is that we want to have let s: String = f"ans = {2 + 2}!".to_string() or some such as an end-goal. Most natural implementation of that (as found, in, eg, Kotlin) requires that the lexer produces many tokens for f"ans = {2 + 2}!" fragment: f"ans ={, 2, +, 2, }!". Are there other things which might require changes to the lexing?

@m-ou-se
Copy link
Member

m-ou-se commented May 31, 2021

We specifically wanted to make these things a lexer error, so we don't have to decide on the lexing rules for that literal yet. E.g. is fb"\xFF" valid or not? How about fb"❉"? Or fr"\"? Or f"hello {123 + "hey".len()}"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-reserved_prefixes `#![feature(reserved_prefixes)]`
Projects
None yet
Development

No branches or pull requests

5 participants