Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line indented again each time rustfmt is run until it's 100 columns wide #5044

Open
autumnontape opened this issue Oct 27, 2021 · 6 comments
Labels
a-macros bug Panic, non-idempotency, invalid code, etc. p-medium

Comments

@autumnontape
Copy link

autumnontape commented Oct 27, 2021

The line containing ::uninit() in the following code is repeatedly indented if rustfmt is run on the file repeatedly. This also causes cargo fmt -- --check to fail, even when cargo fmt was just run, unless the line is as long as permissible.

macro_rules! space_uninit {
    ($capacity:expr) => {
        unsafe {
            core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; $capacity]>
                ::uninit()
        }
    };
}

rustfmt version:

$ rustfmt -V
rustfmt 1.4.37-stable (09c42c4 2021-10-18)
@calebcartwright calebcartwright added bug Panic, non-idempotency, invalid code, etc. a-macros labels Oct 27, 2021
@calebcartwright
Copy link
Member

Thank you for the report and reproducible example! Idempotency issues are absolutely a bug so this will need to be sorted. This is reproducible on the latest version of rustfmt, but it would be useful if you could update the issue description to also note the version of rustfmt you are currently using for historical tracking.

@autumnontape
Copy link
Author

All right, version added to the description!

@ytmimi
Copy link
Contributor

ytmimi commented Nov 5, 2021

So I've done some digging and I think I've been able to narrow down where the problem is, but will continue to search for the exact cause.

TLDR; I think the issue is somewhere in crate::format_snippet.

starting at macros::MacroBranch::rewrite we try to determine the new_body_snippet:

rustfmt/src/macros.rs

Lines 1342 to 1353 in a5f8505

// First try to format as items, then as statements.
let new_body_snippet = match crate::format_snippet(&body_str, &config, true) {
Some(new_body) => new_body,
None => {
let new_width = new_width + config.tab_spaces();
config.set().max_width(new_width);
match crate::format_code_block(&body_str, &config, true) {
Some(new_body) => new_body,
None => return None,
}
}
};

crate::format_snippet returns None so we end up calling crate::format_code_block (code linked for reference)

rustfmt/src/lib.rs

Lines 348 to 433 in a5f8505

fn format_code_block(
code_snippet: &str,
config: &Config,
is_macro_def: bool,
) -> Option<FormattedSnippet> {
const FN_MAIN_PREFIX: &str = "fn main() {\n";
fn enclose_in_main_block(s: &str, config: &Config) -> String {
let indent = Indent::from_width(config, config.tab_spaces());
let mut result = String::with_capacity(s.len() * 2);
result.push_str(FN_MAIN_PREFIX);
let mut need_indent = true;
for (kind, line) in LineClasses::new(s) {
if need_indent {
result.push_str(&indent.to_string(config));
}
result.push_str(&line);
result.push('\n');
need_indent = indent_next_line(kind, &line, config);
}
result.push('}');
result
}
// Wrap the given code block with `fn main()` if it does not have one.
let snippet = enclose_in_main_block(code_snippet, config);
let mut result = String::with_capacity(snippet.len());
let mut is_first = true;
// While formatting the code, ignore the config's newline style setting and always use "\n"
// instead of "\r\n" for the newline characters. This is ok because the output here is
// not directly outputted by rustfmt command, but used by the comment formatter's input.
// We have output-file-wide "\n" ==> "\r\n" conversion process after here if it's necessary.
let mut config_with_unix_newline = config.clone();
config_with_unix_newline
.set()
.newline_style(NewlineStyle::Unix);
let mut formatted = format_snippet(&snippet, &config_with_unix_newline, is_macro_def)?;
// Remove wrapping main block
formatted.unwrap_code_block();
// Trim "fn main() {" on the first line and "}" on the last line,
// then unindent the whole code block.
let block_len = formatted
.snippet
.rfind('}')
.unwrap_or_else(|| formatted.snippet.len());
let mut is_indented = true;
let indent_str = Indent::from_width(config, config.tab_spaces()).to_string(config);
for (kind, ref line) in LineClasses::new(&formatted.snippet[FN_MAIN_PREFIX.len()..block_len]) {
if !is_first {
result.push('\n');
} else {
is_first = false;
}
let trimmed_line = if !is_indented {
line
} else if line.len() > config.max_width() {
// If there are lines that are larger than max width, we cannot tell
// whether we have succeeded but have some comments or strings that
// are too long, or we have failed to format code block. We will be
// conservative and just return `None` in this case.
return None;
} else if line.len() > indent_str.len() {
// Make sure that the line has leading whitespaces.
if line.starts_with(indent_str.as_ref()) {
let offset = if config.hard_tabs() {
1
} else {
config.tab_spaces()
};
&line[offset..]
} else {
line
}
} else {
line
};
result.push_str(trimmed_line);
is_indented = indent_next_line(kind, line, config);
}
Some(FormattedSnippet {
snippet: result,
non_formatted_ranges: formatted.non_formatted_ranges,
})
}

This is where I started to scratch my head, but @calebcartwright if you could provide some more context for why we need to artificially wrap code in an fn main block that would be really helpful. I suspect it's because we call format_snippet again on line 385? Anyway, back to what's happening.

Take a look at the input snippet that we're trying to format. I've included the leading whitespace in comments. (note: leading whitespace seems to be missing from line 1unsafe {, but I don't think that has anything to do with the bug).

unsafe {  // <-- 0
            core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; $capacity]>  //<-- 12
                ::uninit()  //<-- 16
        }  // <-- 8

on line 373 we call enclose_in_main_block, and with the original input produces this snippet:

fn main() {
    unsafe {  // <-- 4
                core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; zcapacity]>  //<--16
                    ::uninit() // <-- 20
            }  // <-- 12
}

The issue starts when we call enclose_in_main_block because for this sample input we always add 4 spaces of indentation to each line.

I guess the main thing to note is the relevant indentation between lines is still preserved at this point.

However, once we call format_snippet on line 385 we get this output:

fn main() {
    unsafe {  // <-- 4
        core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; zcapacity]>  // <-- 8
                    ::uninit()  //<--20
    }  // <-- 4
}

As you can see all lines are formatted except the ::uninit(), which still has 20 leading whitespace characters.

Moving on.

After we format the snippet we work to remove the enclosing fn main block and all indentation that it introduced. For completeness here's the loop that dedents the code, but it's not the source of the issue.

rustfmt/src/lib.rs

Lines 397 to 427 in a5f8505

for (kind, ref line) in LineClasses::new(&formatted.snippet[FN_MAIN_PREFIX.len()..block_len]) {
if !is_first {
result.push('\n');
} else {
is_first = false;
}
let trimmed_line = if !is_indented {
line
} else if line.len() > config.max_width() {
// If there are lines that are larger than max width, we cannot tell
// whether we have succeeded but have some comments or strings that
// are too long, or we have failed to format code block. We will be
// conservative and just return `None` in this case.
return None;
} else if line.len() > indent_str.len() {
// Make sure that the line has leading whitespaces.
if line.starts_with(indent_str.as_ref()) {
let offset = if config.hard_tabs() {
1
} else {
config.tab_spaces()
};
&line[offset..]
} else {
line
}
} else {
line
};
result.push_str(trimmed_line);
is_indented = indent_next_line(kind, line, config);

Now that we've removed the enclosing fn main block crate::format_code_block returns the following snippet:

unsafe {  // <-- 0
    core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; zcapacity]>  // <-- 4
                ::uninit()  // <-- 16
}  // <-- 0

The important thing to note here is that the relative difference in whitespace between the first line and the second line used to be 4, but now it's 8.

Taking a look at the output of running cargo run --bin rustfmt -- --check on the input snippet we get this output (Note the diff is off by 8 spaces, which is the relevant misalignment introduced by format_snippet.

     ($capacity:expr) => {
         unsafe {
             core::mem::MaybeUninit::<[core::mem::MaybeUninit<core::primitive::u8>; $capacity]>
-                ::uninit() // <--16
+                        ::uninit() // <--24
         }
     };
 }

From what I can gather the bug seems to be somewhere in format_snippet

@calebcartwright
Copy link
Member

Thanks for looking at this one @ytmimi

There's a couple of issues at play (a similar variant of the scenario I described in #4629 (review)) but the short version is that we're failing to detect a macro formatting error, and that hits a path of non-idempotency with runaway indentation.

rustfmt operates on the compiler's internal AST representation of input programs and needs valid/parseable input in order to get that AST representation. rustfmt knows how to format each type of AST node according to the respective rules codified in the Style Guide. Macros, both definitions and calls, are not (necessarily) valid Rust code, and their contents aren't represented as any specific type of ast node. There's a top level ast node (e.g. MacroDef, MacCall variants of statement and expression kinds, etc.) which include the sequence of tokens contained within, but the actual set of tokens is arbitrary; just the nature of macros.

rustfmt, to this day, does not really handle macros, at least not well. In the case of macro calls, it uses the parser to see if any of the args "look like" (i.e. can be parsed as) standard/valid Rust code, and if so will attempt to format them. In the case of defs, it's even more complicated, and involves temporarily replacing certain tokens in an attempt to get something that looks more like valid Rust.

In order to format invalid/incomplete/unparseable inputs (as is the case with snippets in doc comments, macro content, etc.) there's a process that produces a valid item (the main function you were referring to) so that it can be parsed and formatted, after successful formatting, it's then reshaped back into the original content with indentation levels adjusted accordingly

@doinkythederp
Copy link

I'm having the same issue but with let/else:

macro_rules! get_arg {
    ($args:expr, $index:expr, $ty:ident) => {{
        let Some(arg): Option<$ty> = $args
-->         .get($index)
-->         .and_then(|p| p.$ty()) else {
-->             bail!("Expected a `{}` at argument {}", stringify!($ty), $index);
-->         };
        arg
    }};
}
$ rustfmt -V
rustfmt 1.5.2-stable (84c898d6 2023-04-16)

@ytmimi
Copy link
Contributor

ytmimi commented May 5, 2023

@doinkythederp thanks for reaching out! You're experiencing a distinct case of the original issue, which has been reported before in #5503 and others. Your issue should be resolved once support for let-else is added to rustfmt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-macros bug Panic, non-idempotency, invalid code, etc. p-medium
Projects
None yet
Development

No branches or pull requests

4 participants