Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git.version_info can drop intermediate fields unexpectedly #1833

Closed
EliahKagan opened this issue Feb 20, 2024 · 2 comments · Fixed by #1838
Closed

Git.version_info can drop intermediate fields unexpectedly #1833

EliahKagan opened this issue Feb 20, 2024 · 2 comments · Fixed by #1838

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Feb 20, 2024

The version_info attribute of Git instances drops non-numeric fields, even if they appear before a field that is kept:

tuple(int(n) for n in version_numbers.split(".")[:4] if n.isdigit()),

If a non-numeric field is followed by a numeric field, the effect is to make it look like the field that was actually non-numeric has the value of the subsequent field:

(.venv) ek@Glub:~/repos-wsl/GitPython (main $=)$ cat >fake-git <<'EOF'
> #!/bin/sh
> printf 'git version 1.2a.3 (totally fake)\n'
> EOF
(.venv) ek@Glub:~/repos-wsl/GitPython (main $%=)$ chmod +x fake-git
(.venv) ek@Glub:~/repos-wsl/GitPython (main $%=)$ python
Python 3.12.1 (main, Dec 10 2023, 15:07:36) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import git
>>> git.refresh('fake-git')
>>> git.Git().version_info
(1, 3)

(As shown, partially numeric fields that do not wholly parse to a number are non-numeric for this purpose.)

I think this should either be changed to stop before the first non-numeric field, or kept as it is but with this behavior documented explicitly to reduce confusion. In the latter case, the rationale should be given. The rationale may just be backward compatibility, though I am not sure if that is a sufficient reason to keep this behavior, since it seems unexpected in the cases where it makes a difference. There may be other alternatives that I haven't thought of.

I think this is a minor issue, since such versions may be rare or nonexistent. git version numbers that have a non-numeric field followed by a numeric field are common, such as 2.43.0.windows.1 on Windows. However, GitPython only looks at the first four fields, so the trailing 1 in 2.43.0.windows.1 is not wrongly presented as the field that follows the 0.

From the comments, it looks like this situation was specifically envisioned, suggesting that the exact current behavior may be intended:

# We only use the first 4 numbers, as everything else could be strings in fact (on Windows).

However, it seems more likely to me that it was done this way to preserve earlier logic while fixing a specific bug. The if n.isdigit() filter was added in 6a61110 (#195).

Full current code for context:

GitPython/git/cmd.py

Lines 814 to 826 in afa5754

def _set_cache_(self, attr: str) -> None:
if attr == "_version_info":
# We only use the first 4 numbers, as everything else could be strings in fact (on Windows).
process_version = self._call_process("version") # Should be as default *args and **kwargs used.
version_numbers = process_version.split(" ")[2]
self._version_info = cast(
Tuple[int, int, int, int],
tuple(int(n) for n in version_numbers.split(".")[:4] if n.isdigit()),
)
else:
super()._set_cache_(attr)
# END handle version info

This may make sense to fix at the same time as #1829 and/or #1830, I'm not sure.

@EliahKagan EliahKagan changed the title Git.version_info may drop intermediate fields unexpectedly Git.version_info can drop intermediate fields unexpectedly Feb 20, 2024
@Byron
Copy link
Member

Byron commented Feb 21, 2024

Thanks for investigating this!

I think the behaviour shown here is clearly incorrect, no matter the intent, and thus a fix would be preferred. Even though such versions would be unusual, it's a question on how to correct them. Should they be turned into the closest-possible number? Or should parsing stop there so, in this example, there is only a single number present? I think these details are best left to the one implementing this.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Feb 22, 2024
This modifies two existing test cases to include an assertion about
the length. These test cases are retained as there is value in
testing on output from a real git command rather than only with
test doubles.

More importantly, this also adds a parameterized test method to
check parsing of:

- All numeric, shorter than the limit - all fields used.
- All numeric, at the limit - all fields used.
- All numeric, longer than the limit - extra fields dropped.
- Has unambiguous non-numeric - dropped from there on.
- Has ambiguous non-numeric, negative int - dropped from there on.
- Has ambiguous non-numeric, number+letter - dropped from there on.

The cases for parsing when a field is not numeric (or not fully or
unambiguously numeric) currently all fail, because the existing
logic drops intermediate non-numeric fields (gitpython-developers#1833).

Parsing should instead stop at (or, *perhaps* in cases like "2a",
after) such fields. When the code is changed to stop at them
rather than dropping them and presenting the subsequent field as
though it were a previous field, these test cases should also pass.
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Feb 22, 2024

I think that if the chosen approach is simple enough then it may reasonably be included in the same PR that fixes some other version_info related bugs. So my inclination is to regard fields like 2a as non-numeric for now and stop before them, with the idea that this behavior is reasonable, but may be changed. This is, after all, the closest correct behavior to the current behavior (which regards such fields as non-numeric and drops them yet still considers the next field).

It may be that fields such as 2a should be recognized as numeric in the future. If so, a decision will have to be made about what suffixes should be accepted, and whether parsing should stop at such a field or still go on to the next one. (The possibility of something like 8.9.2-rc.3 suggests that parsing should not go further, since parsing that the same as 8.9.2 may be okay, but going further would parse it the same as 8.9.2.3 which does not seem right, since 8.9.2-rc.3 is definitely a strictly lower version than 8.9.2.1, while 8.9.2.3 is definitely strictly higher than 8.9.2.1.)

This issue with non-numeric parts is also related, at least conceptually, to the way Git.version_info currently differs (and perhaps always will differ) from sys.version_info, which I have noted in #1830 (comment).

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Feb 22, 2024
For gitpython-developers#1833.

This makes version_info parsing stop including fields after the
first non-numeric field, rather than skipping non-numeric fields
and including subsequent numeric fields that then wrongly appear to
have the original position and significance of the dropped field.

This actually stops at (rather than merely after) a non-numeric
field, i.e., potentially parseable fields that are not fully
numeric, such as "2a", are stopped at, rather than parsed as "2".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants