Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purify CanonPath #9881

Merged
merged 1 commit into from
Feb 16, 2024
Merged

Conversation

Ericson2314
Copy link
Member

@Ericson2314 Ericson2314 commented Jan 29, 2024

Motivation

The core CanonPath constructors were using absPath, but absPath in some situations does IO which is not appropriate. It turns out that these constructors avoided those situations, and thus were pure, but it was far from obvious this was the case.

To remedy the situation, abstract the core algorithm from canonPath to use separately in CanonPath without any IO. No we know by-construction that those constructors are pure.

That leaves CanonPath::fromCWD as the only operation which uses IO / is impure. Add docs on it, and CanonPath as a whole, explaining the situation.

Context

This is also necessary to support Windows paths on windows without messing up CanonPath. But, I think it is good even without that.

In particular, #9767 is on top of this

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

*/
typename std::string canonPathInner(
std::string_view path,
auto && hookComponent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be made a std::function<void(std::string &, std::string_view &)> to avoid the need for a template?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'd rather make this a flag to follow or not symlinks (unless there's a reason to generalize it). Higher-order functions are nice but add some non-trivial complexity (a consumer would have a pretty hard time re-implementing the followSymlinks lambda used here) so I don't think it's a great tradeoff here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a flag to follow symlinks before. The point of this PR is that following the flags was non-obvious. It's very important that CanonPath stuff not secretly use the host file system unless one opts-in with a PosixSourceAccesor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To quote the PR description

The core CanonPath constructors were using absPath, but absPath in some situations does IO which is not appropriate. It turns out that these constructors avoided those situations, and thus were pure, but it was far from obvious this was the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fwiw in #9767 this code needs template regardless; I didn't just do it to try to make the function inline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of #9767 (comment) and https://discourse.nixos.org/t/2024-02-05-nix-team-meeting-minutes-121/39340 is that we do want to do that PR, and so this the stepping-stone.

src/libutil/canon-path.cc Outdated Show resolved Hide resolved
@edolstra
Copy link
Member

When does absPath() do I/O? AFAIK only when resolveSymlinks is true, but it's false when called from the CanonPath constructor.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Jan 30, 2024

Let's talk more about the motivation of this PR, which I think has been lost in this discussion.

I believe that Nix needs "Nix Paths", which are conventionally Unix-style, but could equally be something abstract like std::vector<std::string>, and "Native paths", which may or may not be Unix-style. I arrived at this conclusion thinking about Windows support, but I think it's true either way --- we don't want to e.g. use the current working directory to resolve a relative path that we wish to use on a NAR.

On a conceptual level, it's rather dubious that these two implementations share code at all. For Windows, that conceptual problem becomes practical, as "Nix paths" should not become Windows-like on Windows. I have done this refactor to factor to try to begin separate these things.

If you all don't like templates, and don't like higher-order functions, then I suppose I better just jump straight to using std::filesystem::weakly_canonical in the "Native Path" case, and keep the custom logic just for CanonPath.

@Ericson2314
Copy link
Member Author

Oh we cannot just drop in std::filesystem::weakly_canonical because it still unconditionally follows symlinks when they exist.

@Ericson2314
Copy link
Member Author

This also seems useful for the things we are discussing in #9985.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly two kinds of things

  • actionable suggestions, but also
  • moaning about the (stringly) state of path types in the code. Ignore those?

src/libutil/canon-path.hh Outdated Show resolved Hide resolved
src/libutil/canon-path.hh Outdated Show resolved Hide resolved
/**
* Core pure path canonicalization algorithm.
*
* @param hookComponent A chance to modify `s` and `path` in an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do s and path refer to? These look undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh probably local variables in the implementation. Can't do that in api docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, no they're implicitly parameters in the auto type. Oh C++...

Please describe the supposed signature that auto represents.

I don't know how we're supposed to do this, but maybe

Suggested change
* @param hookComponent A chance to modify `s` and `path` in an
* @param hookComponent(string s, string_view path) A callback to modify `s` and `path` in an

Yet that gives me no clue as to what I my callback is actually supposed to be able to do.

If the answer is "it kinda just works out", that would be rather disappointing, but on brand for template generics.
We might just live with that, but it would be nice to know more about why it works out or how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done another thing to fix the documentation, I hope, without avoiding what I think is syntax Doxygen won't accept.

src/libutil/file-system.cc Outdated Show resolved Hide resolved
src/libutil/file-system.cc Outdated Show resolved Hide resolved
Comment on lines 68 to 69
if (s.empty())
s = "/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get this behavior. I know it was already there, so we should probably keep it?
I suppose the story about relative paths doesn't apply here?

This is probably not the right PR yet for such a change. Need better path types.

Copy link
Member Author

@Ericson2314 Ericson2314 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed (a) I am not sure how we get this either (b) better to modify it later.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Feb 16, 2024

@Ericson2314 #9881 (comment) agreed that for the long-term we need our own abstraction for paths (or more generally the file system) that allows lossless projection onto existing implementations. A classic example is Python's pathlib, so no need to reinvent the wheel from first principles here.

@Ericson2314 Ericson2314 force-pushed the purify-canon-path branch 3 times, most recently from d27d828 to 5a8604b Compare February 16, 2024 14:29
@Ericson2314
Copy link
Member Author

@fricklerhandwerk indeed we've discussed moving to CanonPath + std::filesystem::path as the only two options with clearly separated purposes :).

The core `CanonPath` constructors were using `absPath`, but `absPath` in
some situations does IO which is not appropriate. It turns out that
these constructors avoided those situations, and thus were pure, but it
was far from obvious this was the case.

To remedy the situation, abstract the core algorithm from `canonPath` to
use separately in `CanonPath` without any IO. No we know by-construction
that those constructors are pure.

That leaves `CanonPath::fromCWD` as the only operation which uses IO /
is impure. Add docs on it, and `CanonPath` as a whole, explaining the
situation.

This is also necessary to support Windows paths on windows without
messing up `CanonPath`. But, I think it is good even without that.

Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
Co-authored-by: Robert Hensing <roberth@users.noreply.github.com>
@Ericson2314 Ericson2314 merged commit 60936f2 into NixOS:master Feb 16, 2024
8 checks passed
@Ericson2314 Ericson2314 deleted the purify-canon-path branch February 16, 2024 15:12
@Ericson2314
Copy link
Member Author

@alois31 now that this is merged, making that new version of the pathExists fix should work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants