Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-62090: Simplify argparse usage formatting #105039

Merged
merged 10 commits into from
May 7, 2024
Merged

Conversation

hamdanal
Copy link
Contributor

@hamdanal hamdanal commented May 28, 2023

Rationale

argparse performs a complex formatting of the usage for argument grouping and for line wrapping to fit the terminal width. This formatting has been a constant source of bugs for at least 10 years (see linked issues below) where defensive assertion errors are triggered or brackets and paranthesis are not properly handeled.

Problem

The current implementation of argparse usage formatting relies on regular expressions to group arguments usage only to separate them again later with another set of regular expressions. This is a complex and error prone approach that caused all the issues linked below. Special casing certain argument formats has not solved the problem. The following are some of the most common issues:

  • empty metavar
  • mutually exclusive groups with SUPPRESSed arguments
  • metavars with whitespace
  • metavars with brackets or paranthesis

Solution

The following two comments summarize the solution:

Mainly, the solution is to rewrite the usage formatting to avoid the group-then-separate approach. Instead, the usage parts are kept separate and only joined together at the end. This allows for a much simpler implementation that is easier to understand and maintain. It avoids the regular expressions approach and fixes the corresponding issues.

This closes the following issues:

These PRs become obsolete:

Rationale
=========

argparse performs a complex formatting of the usage for argument grouping
and for line wrapping to fit the terminal width. This formatting has been
a constant source of bugs for at least 10 years (see linked issues below)
where defensive assertion errors are triggered or brackets and paranthesis
are not properly handeled.

Problem
=======

The current implementation of argparse usage formatting relies on regular
expressions to group arguments usage only to separate them again later
with another set of regular expressions. This is a complex and error prone
approach that caused all the issues linked below. Special casing certain
argument formats has not solved the problem. The following are some of
the most common issues:
- empty `metavar`
- mutually exclusive groups with `SUPPRESS`ed arguments
- metavars with whitespace
- metavars with brackets or paranthesis

Solution
========

The following two comments summarize the solution:
- python#82091 (comment)
- python#77048 (comment)

Mainly, the solution is to rewrite the usage formatting to avoid the
group-then-separate approach. Instead, the usage parts are kept separate
and only joined together at the end. This allows for a much simpler
implementation that is easier to understand and maintain. It avoids the
regular expressions approach and fixes the corresponding issues.

This closes the following issues:
- Closes python#62090
- Closes python#62549
- Closes python#77048
- Closes python#82091
- Closes python#89743
- Closes python#96310
- Closes python#98666

These PRs become obsolete:
- Closes python#15372
- Closes python#96311
@cpython-cla-bot
Copy link

cpython-cla-bot bot commented May 28, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@hamdanal
Copy link
Contributor Author

hamdanal commented Jun 8, 2023

@rhettinger @hpaulj I’d like to get your opinion on this change if you don’t mind.

P.S: the failure on Windows doesn’t seem to be related.

@hpaulj
Copy link

hpaulj commented Jun 9, 2023

I agree that the existing usage formatter is too brittle. 9 years ago I tried to write an alternative as part of a project to generalize the concept of mutually-exclusive-groups, allowing other relations (and/or/any etc), and nesting. https://bugs.python.org/issue11588. I can't offhand find my best usage patch, but agree with the idea of building usage groups, and combining them at the end.

But currently I'm not as involved with github and argparse. I'm commenting on some issues, but have not contributed (or tested) a patch (in the old style) or github push in a long time.

@encukou
Copy link
Member

encukou commented Apr 2, 2024

A simplification sounds good to me. I'll start reading up on argparse so I can review this in detail.

@encukou
Copy link
Member

encukou commented Apr 15, 2024

This does not handle nested exclusive groups correctly. Nested exclusive groups are deprecated, but still supported for now. For example:

import argparse
parser = argparse.ArgumentParser()
g=parser.add_mutually_exclusive_group()
g.add_argument('--spam')
gg=g.add_mutually_exclusive_group()
gg.add_argument('--hax')
gg.add_argument('--hex')
g.add_argument('--eggs')
parser.add_argument('--num')
print(parser.format_usage())

Without this change, the output is not ideal, but technically correct:

/tmp/repro.py:5: DeprecationWarning: Nesting mutually exclusive groups is deprecated.
  gg=g.add_mutually_exclusive_group()
usage: repro.py [-h] [--spam SPAM | [--hax HAX | --hex HEX |] --eggs EGGS] [--num NUM]

But with this change, the argument added at the end (--num) joins the group:

/tmp/repro.py:5: DeprecationWarning: Nesting mutually exclusive groups is deprecated.
  gg=g.add_mutually_exclusive_group()
usage: repro.py [-h] [--spam SPAM | [--hax HAX | --hex HEX] | --eggs EGGS | [--num NUM]]

One fix would be to pad the new sequence with None to keep the original length when replacing parts[start:end] at the end. That way the meaning of indices wouldn't change

@hpaulj
Copy link

hpaulj commented Apr 15, 2024 via email

@encukou
Copy link
Member

encukou commented Apr 15, 2024

I'm aware that nested groups are broken and deprecated. But, their behaviour should not be made worse.

This PR, as it is, breaks unrelated usage formatting in the presence of nested groups: the --num above is not part of any group, yet it's formatted as if it was.
(Otherwise, in all cases I could think of so far, the output is the same as before, or better -- in particular it avoids an extra | at the end of a group.)

@hamdanal
Copy link
Contributor Author

Thank you both for the valuable feedback, I've addressed the case of nested groups and tested several variations with/without suppressed actions locally and they all produce the correct usage text now.

@encukou
Copy link
Member

encukou commented May 7, 2024

Thank you for the fix!

@hamdanal hamdanal deleted the argparse-usage branch May 8, 2024 08:50
SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
Rationale
=========

argparse performs a complex formatting of the usage for argument grouping
and for line wrapping to fit the terminal width. This formatting has been
a constant source of bugs for at least 10 years (see linked issues below)
where defensive assertion errors are triggered or brackets and paranthesis
are not properly handeled.

Problem
=======

The current implementation of argparse usage formatting relies on regular
expressions to group arguments usage only to separate them again later
with another set of regular expressions. This is a complex and error prone
approach that caused all the issues linked below. Special casing certain
argument formats has not solved the problem. The following are some of
the most common issues:
- empty `metavar`
- mutually exclusive groups with `SUPPRESS`ed arguments
- metavars with whitespace
- metavars with brackets or paranthesis

Solution
========

The following two comments summarize the solution:
- python#82091 (comment)
- python#77048 (comment)

Mainly, the solution is to rewrite the usage formatting to avoid the
group-then-separate approach. Instead, the usage parts are kept separate
and only joined together at the end. This allows for a much simpler
implementation that is easier to understand and maintain. It avoids the
regular expressions approach and fixes the corresponding issues.

This closes the following GitHub issues:
-  python#62090
-  python#62549
-  python#77048
-  python#82091
-  python#89743
-  python#96310
-  python#98666

These PRs become obsolete:
-  python#15372
-  python#96311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants