Skip to content
This repository has been archived by the owner on Jul 9, 2023. It is now read-only.

A crate to handle Windows' "exotic" paths #34

Open
Eh2406 opened this issue Jun 27, 2019 · 17 comments
Open

A crate to handle Windows' "exotic" paths #34

Eh2406 opened this issue Jun 27, 2019 · 17 comments

Comments

@Eh2406
Copy link

Eh2406 commented Jun 27, 2019

Std path does some weird things with UNC paths (\\?\). For one std::fs::canonicalize always returns one whether it is needed or not. Then path.join will just do a string concatenation leading to invalid paths. This leads to bugs in many important parts of the ecosystem. (Cargo, wasm-pack, Rustup)

So we need a library that provides a binding to GetFullPathNameW on windows and uses std::fs::canonicalize otherwize.
More ambitiously we need a path "interpreter" that ".. and . while appending. So if you started with \\?\C:\bar and joined ../foo you would iterate the components of ../foo and apply them to the base path, first applying .. to get \\?\C:\ and then applying foo to get \\?\C:\foo."

Ether part of this would be a valuable addition to the ecosystem!

@dtolnay
Copy link
Owner

dtolnay commented Jun 27, 2019

Is this something that could be fixed in fs::canonicalize and Path::join? If not, is that because of backward compatibility restrictions or because the current behavior is useful and worth keeping? If the latter, how would the spec for this new library's canonicalize and join functionality be different from what std canonicalize and join claim to do?

@Eh2406
Copy link
Author

Eh2406 commented Jun 27, 2019

So @retep998 knows this better than I.

My impression is that the fs::canonicalize is locked by backward compatibility and adding an alternative like fs::normalize is blocked on the lib team being sure it is a good design (rust-lang/rust#59117) The impression is that a crate could iterate and suggest uplift when polished.

My impression is that the std::Path::join api is technically correct. Technically \\?\C:\bar\../foo is a valid path on windows, it is "on drive C in folder bar in folder ../foo" and so joining \\?\C:\bar\ with ../foo should give you \\?\C:\bar\../foo. However if my Cargo toml claims that foo = { path = "../foo" }, then Cargo crashes if my working directory is \\?\C:\bar\, we need a library that does the same modifications to the path as joining ../foo would on a normal path.

@dtolnay
Copy link
Owner

dtolnay commented Jun 27, 2019

Thanks, makes sense. Would something like the following API be sufficient for what Cargo needs? :

pub struct SanePath {...}

impl SanePath {
    pub fn normalize<P: AsRef<Path>>(path: P) -> Self;
    pub fn join<P: AsRef<Path>>(&self, path: P) -> Self;
    pub fn as_std_path(&self) -> &Path;
}

impl AsRef<Path> for SanePath {...}

@Eh2406
Copy link
Author

Eh2406 commented Jun 27, 2019

Yes! Or freestanding functions:

pub fn normalize<P: AsRef<Path>>(path: P) -> PathBuf {...}
pub fn join<P: AsRef<Path>>(base: &mut PathBuf, addition: P) {...}

@retep998
Copy link

fs::canonicalize is not broken, but rather it does a fundamentally different operation than what fs::normalize would do. One is asking the OS for the canonical path to a file that actually exists, the other is just turning a relative path into an absolute path.

@basil-cow
Copy link

I'd like to tackle this one, but from this issue I didn't understand whether popping on a .. and doing nothing on a . is enough (there are some edge cases but that the gist of it). This may break symlinks, is this desired? Are there any caveats one should be aware of?

@Eh2406
Copy link
Author

Eh2406 commented Nov 10, 2019

I think it is ok to brake symlinks. \\?\C:\bar/../foo is a valid windows path, but we don't want the folder named bar/../foo in C:\. We want the folder foo in C:\. Dose that makes sense?

@basil-cow
Copy link

basil-cow commented Nov 10, 2019

@Eh2406
Copy link
Author

Eh2406 commented Nov 15, 2019

Thanks for the links. Now that I have had time to read it carefully... I don't know. path-absolutize does not use GetFullPathNameW, witch is grate for cross compat, but makes me nervous that it will differ from the OS behavior. Neither crate has tests for UNC paths or cases of mixtures of / and \. From the examples and skimming the code I don't know how well it handles non-unicode data, but I am not sure how well std does ether.

@ajeetdsouza
Copy link

ajeetdsouza commented Aug 29, 2020

I saw a post related to this on Reddit recently, and I'd like to try implementing this.

@Eh2406 @dtolnay could you have a look at my repository (https://github.com/ajeetdsouza/pathology) and tell me your initial thoughts? I've currently only written an implementation for Windows, but I'm working on one for Linux too.

The only function I've implemented is normalize, which lexically converts the path to its simplest form, without actually querying the filesystem.

On Windows, I'm using GetFullPathNameW for this. Since drive letters are case insensitive, but are written in uppercase by convention, I'm capitalizing it manually (GetFullPathNameW doesn't always do this). Error handling is still a WIP, I'll get that done soon enough.

On Linux, there is no system call to lexically normalize paths, but GNU realpath can do this via realpath -ms. I'm currently rewriting that logic in Rust.

@ajeetdsouza
Copy link

Update: I've added normalize on Unix, too.

@Eh2406
Copy link
Author

Eh2406 commented Sep 11, 2020

@ajeetdsouza sorry for the slow reply. That looks like a great start on a answer to the question "I use your_thing(&std::fs::canonicalize(...)) and it broke, what should I use instead?" I look forward to having that in our arsenal. Thank you!

@retep998 may be able to revue the windows api calls better than I.

@pksunkara
Copy link

Related note for others, dunce is an alternative to fs::canonicalize.

@dylni
Copy link

dylni commented Nov 9, 2020

normpath can now be used to solve this issue. It defines BasePath, which is very similar to SanePath, and PathExt::normalize can be used for normalization.

It would be great if @retep998 could review this crate as well.

@dylni
Copy link

dylni commented Nov 19, 2020

@Eh2406 What would be the next steps for integrating this into Cargo?

@Eh2406
Copy link
Author

Eh2406 commented Nov 20, 2020

Last time I had this paged in rust-lang/cargo#6198 was the best link. Looks like we just got a related PR rust-lang/cargo#8874, so coordinating my make sense. Another good place to start would be to grep for canonicalize in the code base. Sorry I don't remember this stuff better.

@dylni
Copy link

dylni commented Nov 20, 2020

@Eh2406 Thanks. I'll start looking through where the path handling can be improved

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants