Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path.from_uri() doesn't work if the URI contains host component #123599

Open
pekkaklarck opened this issue Sep 2, 2024 · 6 comments
Open

Path.from_uri() doesn't work if the URI contains host component #123599

pekkaklarck opened this issue Sep 2, 2024 · 6 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@pekkaklarck
Copy link

pekkaklarck commented Sep 2, 2024

Bug report

Bug description:

Path.from_uri() introduced in Python 3.13 doesn't work properly if the URI contains a host component other than localhost. Following examples are run with Python 3.13 rc 1 on Linux with a machine having host name kone:

>>> print(Path().from_uri('file:///home/peke/test'))
/home/peke/test
>>> print(Path().from_uri('file://localhost/home/peke/test'))
/home/peke/test
>>> print(Path().from_uri(f'file://{socket.getfqdn()}/home/peke/test'))
//kone/home/peke/test

According to RFC 8089 including the host component as a fully qualified name is fine so this looks like a bug to me.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@pekkaklarck pekkaklarck added the type-bug An unexpected behavior, bug, or error label Sep 2, 2024
@pekkaklarck
Copy link
Author

Accepting a host component, other than localhost, raises a question about validity of the used host name. For example, browsers seem to totally ignore the host component and accept paths like file://whatever/home/peke/test in my case. I believe Python should be more strict, though, and raise a ValueError is the host component doesn't match the system where the code is run. Although the RFC mandates the host component to be fully qualified, I believe accepting only the host name should be fine too. If someone wants to parse file URIs with different host names, they can use urilib.parse.urlparse instead.

It might be that UNC Windows file paths even further. I tested this on Windows and there this usage makes sense:

>>> p = Path(r'\\host\path')
>>> print(p.as_uri())
file://host/path/
>>> p == Path.from_uri(p.as_uri())
True

Perhaps from_uri behavior should depend on the operating system.

@barneygale
Copy link
Contributor

All URIs with non-empty, non-localhost authorities parse as Path objects that start with a double slash, so it should be straightforward to reject these paths:

path = Path.from_uri('file://server/share/foo.txt')
if path.as_posix().startswith('//'):
    raise ValueError('Non-local file URI')

We could add a local_authorities argument so that users can override the ['', 'localhost'] defaults

@pekkaklarck
Copy link
Author

Explicitly rejecting paths would certainly be better than returning invalid paths. Including socket.getfqdn() and possibly also socket.gethostname() in the list of local authorities could be more convenient than requiring users to pass them explicitly, though. If the list is made configurable, it probably should contain localhost and the empty string by default.

@pekkaklarck
Copy link
Author

This probably would anyway require special handling on Windows. On POSIX something like Path.from_uri('file://host/path') should yield Path('/path'), assuming that host is detected to be a local authority, but on Windows throwing the host part away would break UNC paths.

@barneygale
Copy link
Contributor

We should try to be consistent across OSs unless we really must diverge IMO.

Explicitly rejecting paths would certainly be better than returning invalid paths.

Paths starting with two slashes are valid on both Windows and POSIX. On Windows they're UNC paths, whereas on POSIX they're implementation-defined (ref).

@pekkaklarck
Copy link
Author

It can be hard to be totally consistent across OSes. On Windows Path(r'\\host\path').as_uri() yields file://host/path/, and it makes sense that round-trip works and Path.from_uri('file://host/path/') yields Path(r'\\host\path'). In other words, the host component is preserved on Windows. On the other hand, when Path.from_uri('file://host/path/') is used on POSIX, the host component can be validated but there's, AFAIK, no way to preserve it and the return value can only be Path('/path').

There's already other functionality in pathlib that's operating system dependent and I don't see why from_uri couldn't be as well. Someone needing, for example, Windows semantics on POSIX could then explicitly use PureWindowsPath.

I should have used "incorrect" instead of "invalid" in my earlier comment. Although Path('//hello/world') is valid, it certainly isn't the correct return value Path.from_uri('file://hello/world') on POSIX.

@barneygale barneygale added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Sep 3, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Sep 3, 2024
…()` on POSIX

Raise `ValueError` in `pathlib.Path.from_uri()` if the given `file:` URI
specifies a non-empty, non-`localhost` authority, and we're running on a
platform without support for UNC paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants