Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the Ruff formatter validate the final AST in a similar manner to Black? #8184

Closed
max-muoto opened this issue Oct 24, 2023 · 20 comments · Fixed by #8624
Closed

Does the Ruff formatter validate the final AST in a similar manner to Black? #8184

max-muoto opened this issue Oct 24, 2023 · 20 comments · Fixed by #8624
Labels
formatter Related to the formatter question Asking for support or clarification

Comments

@max-muoto
Copy link
Contributor

We are looking to replace black with the Ruff formatter, but one question I had was does the Ruff formatter validate for an AST equivalent to the original? I know this can be disabled in black with the --fast flag but was curious if this was built into the Ruff formatter by default.

@charliermarsh
Copy link
Member

We don't have this behavior, no. We could consider adding it based on interest.

(Not strictly relevant to the question, but just so there's no confusion for any future readers: all benchmarks were compared against Black with --fast.)

@max-muoto
Copy link
Contributor Author

Thanks for the info!

@charliermarsh
Copy link
Member

No problem.

For completeness: our test suite consists of a large set of fixtures (including Black's own test fixtures, plus a bunch of our own), and then we also run the formatter over a selection of open source projects on CI too (twine, django, transformers, typeshed, warehouse, zulip, home-assistant, poetry, cpython).

For the fixtures, we validate output against checked-in snapshots. For the ecosystem projects, we're just diffing to assess similarity. But in both cases, we also error if:

  • We output syntactically invalid code.
  • We fail to format any comments.
  • Running the formatter twice results in code changes (i.e., the formatter is unstable).

@MichaReiser MichaReiser added question Asking for support or clarification formatter Related to the formatter labels Oct 24, 2023
@charliermarsh
Copy link
Member

(Will leave this open for now to see if other folks have additional feedback or questions.)

@konstin
Copy link
Member

konstin commented Oct 25, 2023

We also have fuzzing (e.g. #7938) which covers a lot of strange edge cases. In recent runs i've only seen instabilities (the second formatting pass looks different than the first) but no syntax errors.

@tusharsadhwani
Copy link
Contributor

Validating the final AST can still help against accidental formatting changes that would change runtime behaviour.

A good example would be accidentally formatting f"{a+ 2=}" into f"{a + 2 = }", which seems like just a formatting change but actually changes the string produced at runtime. The Python AST for this case contains the entire expression itself in it for the same reason.

@tusharsadhwani
Copy link
Contributor

FWIW I think it should be opt-in, behind a flag to enable it, and all tests (including ones that run ruff against open source projects) should run with that flag enabled, to catch such cases.

@MichaReiser
Copy link
Member

MichaReiser commented Oct 30, 2023

Validating the final AST can still help against accidental formatting changes that would change runtime behaviour.

That's true, but it requires that the used AST only captures runtime semantics, which isn't the case for Ruff. For example, ruff stores (or will soon) the parts of each implicit concatenated string as individual nodes because having access to each part is useful when doing static analysis (but irrelevant for an interpreter):

"a" "b"

# joined
"ab"
  • "a" "b":StringList["a", "b"]
  • "ab": StringList["ab"]

The runtime value of both expressions is the same, but Ruff uses a different internal representation for each.

I believe there are other examples around parenthesizing match cases which change the AST structure, but don't result in a semantical change.

Because of that, I believe using the AST for asserting that the runtime semantics are unchanged isn't a good choice and it prevents Ruff from changing its internal AST structure to capture more information, slowing down the development.

It would be nice to have another automatic means to validate that the formatter doesn't change program semantics (could be very useful during fuzzing). We could explore integrating and comparing Python's AST as part of fuzzing, but I prefer not to compare Ruff's ASTs as part of ruff format for the reasons mentioned above and because re-parsing and comparing the AST causes a significant slowdown.

@tusharsadhwani
Copy link
Contributor

We could explore integrating and comparing Python's AST as part of fuzzing

@MichaReiser That is indeed along the lines of what I'm suggesting. Having that option available to end users behind a CLI flag would be a nice to have, but using it for fuzzing during CI seems much more important to me.

@charliermarsh
Copy link
Member

That's true, but it requires that the used AST only captures runtime semantics, which isn't the case for Ruff.

In fairness, we do have the Comparable AST variant which exists for this purpose and intentionally ignores (e.g.) implicit concatenations. That property should be maintained over time too -- it's meant to capture whether two AST nodes correspond to the same value.

@MichaReiser
Copy link
Member

In fairness, we do have the Comparable AST variant which exists for this purpose and intentionally ignores (e.g.) implicit concatenations. That property should be maintained over time too -- it's meant to capture whether two AST nodes correspond to the same value.

Oh right. Although it will be interesting to see how we would support this (in a cheap way) if we change our AST structure more fundamentally.

We're also considering integrating isort into the formatter, which would ultimately mean that we have to remove the AST equality check.

@tusharsadhwani
Copy link
Contributor

tusharsadhwani commented Nov 1, 2023

Consider having isort stuff as a separate step, and do the AST validation before doing import sorting in your tests?

Since import sorting should be a flag that you can disable (isort is too opinionated for a lot of people), this would make sense overall too.

@juftin
Copy link
Contributor

juftin commented Nov 8, 2023

I've been working on introducing the ruff formatter at work but got this question when trying to introduce it as a formatting tool for my organization.

The black formatter's claim that it is safe to use is based on the fact that it checks the AST before and after and compares the two is a big reason why a given org may push back on switching from black to ruff - it's a powerful piece of marketing.

To put things in perspective, the code equivalence check is a feature of Black which other formatters don’t implement at all. It is of crucial importance to us to ensure code behaves the way it did before it got reformatted. We treat this as a feature and there are no plans to relax this in the future.

Implementing a similar AST validation as an optional flag for the ruff formatter would be a great feature that would make switching over a no-brainer for the team - and speaking personally we would be more than willing to pay for the performance hit that it might cause.

@SonOfLilit
Copy link

If you only want to add it to tests and fuzzing, one easy way to write an AST comparison test would be:

def check_same_ast(path):
    with temp_copy(path) as before:
        with temp_copy(path) as after:
            run_ruff(after)
            run_black(before)
            run_black(after)
            assert run_sha256sum(before) == run_sha256sum(after)

@tusharsadhwani
Copy link
Contributor

@SonOfLilit ruff outputs aren't identical to black though

@SonOfLilit
Copy link

SonOfLilit commented Nov 11, 2023 via email

@tusharsadhwani
Copy link
Contributor

Ah right, i misread the code. Not a bad idea.

charliermarsh added a commit that referenced this issue Nov 13, 2023
## Summary

This PR implements validation in the formatter tests to ensure that we
don't modify the AST during formatting. Black has similar logic.

In implementing this, I learned that Black actually _does_ modify the
AST, and their test infrastructure normalizes the AST to wipe away those
differences. Specifically, Black changes the indentation of docstrings,
which _does_ modify the AST; and it also inserts parentheses in `del`
statements, which changes the AST too.

Ruff also does both these things, so we _also_ implement the same
normalization using a new visitor that allows for modifying the AST.

Closes #8184.

## Test Plan

`cargo test`
@charliermarsh
Copy link
Member

We now perform this validation during tests.

Interestingly, in implementing this validation, I learned that Black actually does modify the AST during formatting, and so Ruff does too :) Namely, Black modifies indentation within docstrings, which is reflected in the AST; and Black will also parenthesize long del statements (e.g., convert del a, b to del (a, b)), which doesn't change the semantics but does change the AST.

@jonasrauber
Copy link

If I understand the above correctly, this check is only performed when running ruffs test suite at the moment.
Is there an option to use this at runtime like in black?

@charliermarsh
Copy link
Member

No, we don’t support running this check at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter question Asking for support or clarification
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants