Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect NulInCStr error earlier. #119172

Merged
merged 1 commit into from
Jan 18, 2024
Merged

Conversation

nnethercote
Copy link
Contributor

By making it an EscapeError instead of a LitError. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in report_lit_error to calculate the span of the nul char from a range. This code used a hardwired +2 to account for the c" at the start of a C string literal, but this should have changed to a +3 for raw C string literals to account for the cr", which meant that the caret in cr" nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? @fee1-dead

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 21, 2023
@nnethercote
Copy link
Contributor Author

@fee1-dead: some additional context in this zulip thread.

In short, the delayed C NUL str check is inconsistent with all other string literal checks. If it ships in its current state, we're stuck with that behaviour permanently. If we move it earlier right now before it ships, we have the option to delay it (and all other string literal checks) later on (as implemented in #118699). So if we do this in the next few days, we avoid a one-way door shutting.

cc @joshtriplett

@nnethercote
Copy link
Contributor Author

Ugh, the refusal of git and GitHub to show "binary" files is really annoying for this one.

@fee1-dead
Copy link
Member

Currently on my phone right now, but do we have a test for c"\0" on an earlier edition passed to a macro which then consumes it?

@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

Ugh, the CI failure is due to a rust-analyzer sync problem #118861, which is blocked by #119124, which is awaiting review.

@nnethercote
Copy link
Contributor Author

Ugh, the refusal of git and GitHub to show "binary" files is really annoying for this one.

To clarify: the test contains 15 C string literals. Prior to this commit, only the first five have ERROR annotations. This commit adds ERROR annotations to the other ten. And the .stderr file changes from having five errors to fifteen errors.

@nnethercote
Copy link
Contributor Author

nnethercote commented Dec 21, 2023

Currently on my phone right now, but do we have a test for c"\0" on an earlier edition passed to a macro which then consumes it?

I don't think so. Is that interesting? It would produce output like this:

error: expected one of `!`, `.`, `::`, `;`, `?`, `else`, `{`, `}`, or an operator, found `"\0"`
 --> a2.rs:7:10
  |
7 |     _ = c"\0";
  |          ^^^^ expected one of 9 possible tokens

@fee1-dead
Copy link
Member

I meant something like this:

macro_rules! hi {
    (c $t:tt) => { $t };
}

fn main() {
    println!(hi!(c"hello!\0"))
}

do we already have that in our test suite?

@nnethercote
Copy link
Contributor Author

I don't see anything in tests/ui/ like that, no.

@traviscross
Copy link
Contributor

@rustbot labels +I-lang-nominated

Nominating this for us to discuss as there is some time pressure here. If we want to save this space, we should do it before C string literals stabilize in Rust 1.76.

@rustbot rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Dec 21, 2023
@petrochenkov petrochenkov self-assigned this Dec 21, 2023
@joshtriplett
Copy link
Member

Following up on this: the decision made was to defer the stabilization by a release, so that we don't need to rush this change in over the holidays.

@bors

This comment was marked as resolved.

@rustbot
Copy link
Collaborator

rustbot commented Jan 12, 2024

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@nnethercote
Copy link
Contributor Author

The rust-analyzer problem has been fixed, and I have fixed the conflicts. @fee1-dead, this is ready for review. Unfortunately, git and GitHub both refuse to show the changed test file because it contains NUL chars and so is "binary".

Here's the before:

// edition: 2021

fn main() {
    c"\0";
    //~^ ERROR null characters in C string literals

    c"\u{00}";
    //~^ ERROR null characters in C string literals

    c"^@";
    //~^ ERROR null characters in C string literals

    c"\x00";
    //~^ ERROR null characters in C string literals

    cr"^@";
    //~^ ERROR null characters in C string literals
}

macro_rules! empty {
    ($($tt:tt)*) => {};
}

// The cfg consumes the literals before nul checking occurs.
#[cfg(FALSE)]
fn test() {
    c"\0";
    c"\u{00}";
    c"^@";
    c"\x00";
    cr"^@";
}

// The macro consumes the literals before nul checking occurs.
fn test_empty() {
    empty!(c"\0");
    empty!(c"\u{00}");
    empty!(c"^@");
    empty!(c"\x00");
    empty!(cr"^@");
}

and the after:

// edition: 2021

// The null char check for C string literals was originally implemented after
// expansion, which meant the first five strings in this file triggered errors,
// and the remaining ten did not. But this is different to all the other
// content checks done on string literals, such as checks for invalid escapes
// and bare CR chars. So the check was moved earlier. The check can be moved
// back to after expansion at a later date if necessary, because that would be
// a backward compatible change. (In contrast, moving the check from after
// expansion to lexing time would be a backward incompatible change, because it
// could break code that was previously accepted.)

fn main() {
    c"\0";     //~ ERROR null characters in C string literals
    c"\u{00}"; //~ ERROR null characters in C string literals
    c"^@";     //~ ERROR null characters in C string literals
    c"\x00";   //~ ERROR null characters in C string literals
    cr"^@";    //~ ERROR null characters in C string literals
}

macro_rules! empty {
    ($($tt:tt)*) => {};
}

// The cfg does not consume the literals before nul checking occurs.
#[cfg(FALSE)]
fn test() {
    c"\0";     //~ ERROR null characters in C string literals
    c"\u{00}"; //~ ERROR null characters in C string literals
    c"^@";     //~ ERROR null characters in C string literals
    c"\x00";   //~ ERROR null characters in C string literals
    cr"^@";    //~ ERROR null characters in C string literals
}

// The macro does not consume the literals before nul checking occurs.
fn test_empty() {
    empty!(c"\0");     //~ ERROR null characters in C string literals
    empty!(c"\u{00}"); //~ ERROR null characters in C string literals
    empty!(c"^@");     //~ ERROR null characters in C string literals
    empty!(c"\x00");   //~ ERROR null characters in C string literals
    empty!(cr"^@");    //~ ERROR null characters in C string literals
}

as rendered by vim.

@rust-log-analyzer

This comment has been minimized.

By making it an `EscapeError` instead of a `LitError`. This makes it
like the other errors produced when checking string literals contents,
e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion,
which changes behaviour. It will be possible to move the check back to
the later point if desired. If that happens, it's likely that all the
string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in
`report_lit_error` to calculate the span of the nul char from a range.
This code used a hardwired `+2` to account for the `c"` at the start of
a C string literal, but this should have changed to a `+3` for raw C
string literals to account for the `cr"`, which meant that the caret in
`cr"` nul error messages was one short of where it should have been. The
new approach doesn't need any of this and avoids the off-by-one error.
@petrochenkov
Copy link
Contributor

Blocked on the decisions in #118699, but I'm fine with temporarily landing this if it unblocks stabilizing C-strings sooner.
@rustbot blocked

@rustbot rustbot added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 12, 2024
@nnethercote
Copy link
Contributor Author

Blocked on the decisions in #118699, but I'm fine with temporarily landing this if it unblocks stabilizing C-strings sooner.

That's exactly the point of this PR: to unblock stabilization of C-string literals. Can you unblock?

@rustbot rustbot removed I-lang-nominated Nominated for discussion during a lang team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jan 17, 2024
@petrochenkov
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jan 17, 2024

📌 Commit 9018d2c has been approved by petrochenkov

It is now in the queue for this repository.

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 17, 2024
compiler-errors added a commit to compiler-errors/rust that referenced this pull request Jan 17, 2024
…etrochenkov

Detect `NulInCStr` error earlier.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? `@fee1-dead`
@ehuss
Copy link
Contributor

ehuss commented Jan 17, 2024

@nnethercote Just want to double check if you can post a reference PR update as mentioned at https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/rfc.203349.3A.20mixed.20utf8.20literals/near/409285306 to update the grammar and remove the sentence about post-lexing validation?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 18, 2024
…etrochenkov

Detect `NulInCStr` error earlier.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? `@fee1-dead`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 18, 2024
…etrochenkov

Detect `NulInCStr` error earlier.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? ``@fee1-dead``
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 18, 2024
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#119172 (Detect `NulInCStr` error earlier.)
 - rust-lang#119833 (Make tcx optional from StableMIR run macro and extend it to accept closures)
 - rust-lang#119955 (Modify GenericArg and Term structs to use strict provenance rules)
 - rust-lang#120021 (don't store const var origins for known vars)
 - rust-lang#120038 (Don't create a separate "basename" when naming and opening a MIR dump file)
 - rust-lang#120057 (Don't ICE when deducing future output if other errors already occurred)
 - rust-lang#120073 (Remove spastorino from users_on_vacation)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 18, 2024
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#119172 (Detect `NulInCStr` error earlier.)
 - rust-lang#119833 (Make tcx optional from StableMIR run macro and extend it to accept closures)
 - rust-lang#119967 (Add `PatKind::Err` to AST/HIR)
 - rust-lang#119978 (Move async closure parameters into the resultant closure's future eagerly)
 - rust-lang#120021 (don't store const var origins for known vars)
 - rust-lang#120038 (Don't create a separate "basename" when naming and opening a MIR dump file)
 - rust-lang#120057 (Don't ICE when deducing future output if other errors already occurred)
 - rust-lang#120073 (Remove spastorino from users_on_vacation)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 18, 2024
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#119172 (Detect `NulInCStr` error earlier.)
 - rust-lang#119833 (Make tcx optional from StableMIR run macro and extend it to accept closures)
 - rust-lang#119967 (Add `PatKind::Err` to AST/HIR)
 - rust-lang#119978 (Move async closure parameters into the resultant closure's future eagerly)
 - rust-lang#120021 (don't store const var origins for known vars)
 - rust-lang#120038 (Don't create a separate "basename" when naming and opening a MIR dump file)
 - rust-lang#120057 (Don't ICE when deducing future output if other errors already occurred)
 - rust-lang#120073 (Remove spastorino from users_on_vacation)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit ff8c7a7 into rust-lang:master Jan 18, 2024
11 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 18, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 18, 2024
Rollup merge of rust-lang#119172 - nnethercote:earlier-NulInCStr, r=petrochenkov

Detect `NulInCStr` error earlier.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? ```@fee1-dead```
nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 22, 2024
Which moved the checking for NUL chars in C string literals earlier.
@nnethercote
Copy link
Contributor Author

@nnethercote Just want to double check if you can post a reference PR update

Thanks for the reminder, done in rust-lang/reference#1450.

@nnethercote nnethercote deleted the earlier-NulInCStr branch January 22, 2024 03:01
nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 22, 2024
Which moved the checking for NUL chars in C string literals earlier.
nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 25, 2024
Which moved the checking for NUL chars in C string literals earlier.
nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 26, 2024
Which moved the checking for NUL chars in C string literals earlier.
lnicola pushed a commit to lnicola/rust that referenced this pull request Jan 28, 2024
…etrochenkov

Detect `NulInCStr` error earlier.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? ```@fee1-dead```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants