Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

Open
roberth opened this issue Oct 31, 2023 · 6 comments
Open
Labels
derivation design Issues to consider for new versions of the derivation format (major or incremental) feature Feature request or proposal store Issues and pull requests concerning the Nix store

Comments

@roberth
Copy link
Member

roberth commented Oct 31, 2023

Is your feature request related to a problem? Please describe.

We can't currently swap out an inputSrcs derivation input for an inputDrvs input derivation input and get the same output path, despite no observable differences (with the possible exception of introspective features such as exportReferencesGraph, which we could force to be mutually exclusive with this improved hashing feature).

nix repl session illustrating how it currently works, and what this improvement entails

Currently we have some equivalence between FOD output paths and sources:

nix-repl> pkgs.emptyFile.outPath
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

nix-repl> "${./empty-file}"
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

However, as expected, the string contexts are different:

nix-repl> :p builtins.getContext pkgs.emptyFile.outPath
{ "/nix/store/jnghgcdkdc79p7qp4vfc27ryvnnm5igi-empty-file.drv" = { outputs = [ "out" ]; }; }

nix-repl> :p builtins.getContext "${./empty-file}"                       
{ "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" = { path = true; }; }

In itself, this is not a problem. Fact is that they need to be built differently. One involves an FOD. The other was added to the store during instantiation.

That being the case, we can expect dependents to have different derivation hashes, representing the different build graph.

nix-repl> pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]
«derivation /nix/store/nmzk7crah5himm4h3nbh6ry8331c99nq-derived.drv»

nix-repl> pkgs.concatText "derived" [ "${./empty-file}" ]      
«derivation /nix/store/6bs6pkf3z4gfmjk0ssr7qfldrvyr8qfy-derived.drv»

So far so good, but now we observe a difference in output paths, which is not necessary.

nix-repl> "${pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]}"   
"/nix/store/4q9z73ak24fl3qr4j5wwkjbw0l7ch0rh-derived"

nix-repl> "${pkgs.concatText "derived" [ "${./empty-file}" ]}"      
"/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived"

By implementing this proposal (and enabling it in Nixpkgs), both output paths would have the value /nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived.

Describe the solution you'd like

Instantiation works the same, except for the hashing of certain input-addressed outputs.

Specifically, when a derivation attribute flag such as __derivationAgnosticOutputHashing is enabled, inputs from FODs are ignored for the purpose of hashing; instead interpreting their output paths as inputSrcs.

Ignore __derivationAgnosticOutputHashing when the build could perceive the difference, e.g. when it has flags that would let it introspect its dependencies. I don't think it could use exportReferencesGraph on itself, so this may not even be a problem. I just haven't checked whether that was everything that needs to be controlled. TBD.

Describe alternatives you've considered

Fail instead of ignoring __derivationAgnosticOutputHashing. In practice this would mean that mkDerivation has to be clever about when to set the flag, which seems rather convoluted, pointless, and ever so slightly slower.

Additional context

Priorities

Add 👍 to issues you find important.

@roberth roberth added feature Feature request or proposal store Issues and pull requests concerning the Nix store labels Oct 31, 2023
@Ericson2314
Copy link
Member

Yeah I've wanted this for a bit. Note that it can/should apply to all content-addressed outputs too; that is also non-fixed ones.

@roberth roberth changed the title Make input-addressed output paths indifferent to possible FOD origin of sources Make input-addressed output paths indifferent to possible CA/FOD origin of sources Nov 28, 2023
@roberth
Copy link
Member Author

roberth commented Dec 7, 2023

The logic should also apply to CA realisation (.doi) lookups, which are currently just drvHash based; not only input addressed derivation outputs seem to suffer from the discrepancy.

@Ericson2314
Copy link
Member

Ericson2314 commented Dec 8, 2023

@roberth I've also been thinking about whether we can put the caching of this stuff in the derivation itself. Right now needing to recompute everything from the roots is annoying/slow. It would be cool if you could resume it from any derivation. (Of course, for verifying the store it would be good to still retain the ability to check it from scratch.)

@Ericson2314
Copy link
Member

nix/src/libstore/daemon.cc

Lines 583 to 614 in c8458bd

/* Content-addressed derivations are trustless because their output paths
are verified by their content alone, so any derivation is free to
try to produce such a path.
Input-addressed derivation output paths, however, are calculated
from the derivation closure that produced them---even knowing the
root derivation is not enough. That the output data actually came
from those derivations is fundamentally unverifiable, but the daemon
trusts itself on that matter. The question instead is whether the
submitted plan has rights to the output paths it wants to fill, and
at least the derivation closure proves that.
It would have been nice if input-address algorithm merely depended
on the build time closure, rather than depending on the derivation
closure. That would mean input-addressed paths used at build time
would just be trusted and not need their own evidence. This is in
fact fine as the same guarantees would hold *inductively*: either
the remote builder has those paths and already trusts them, or it
needs to build them too and thus their evidence must be provided in
turn. The advantage of this variant algorithm is that the evidence
for input-addressed paths which the remote builder already has
doesn't need to be sent again.
That said, now that we have floating CA derivations, it is better
that people just migrate to those which also solve this problem, and
others. It's the same migration difficulty with strictly more
benefit.
Lastly, do note that when we parse fixed-output content-addressed
derivations, we throw out the precomputed output paths and just
store the hashes, so there aren't two competing sources of truth an
attacker could exploit. */
oh here is another angle I forgot.

If we make that inputSrcs vs inputDrvs never influences the output path then we get:

  1. Even more preservation of input addresses across rewrites which don't matter
  2. Sending input-addressed derivations to another machine no longer requires trust / no longer has attack vector of writing to arbitrary output paths. (However sending over input-addressed data is still risky, so this only helps with "shallow" input-addressing where the remote already has/trust any input that isn't content addressed.)
  3. Also addresses the caching issue above, because we only need to look up the output paths of our immediate deps.

The algorithm is very simple (works for the dynamic derivations definition too):

  1. Inputs are conceptually Set DerivingPath
  2. Whenever we have a DerivingPath::Output { drvPath, outputName } we can statically/trustlessly evaluate to a DerivingPath::Constant, do so. (Anywhere nested, for dynamic derivations case)
  3. Once no more such static/trustless rewrites are possible, calculate the output path on the resulting derivation.

@roberth
Copy link
Member Author

roberth commented Dec 13, 2023

Probably best to start with

That will give us the language to describe the new hashing scheme more effectively.

@Ericson2314
Copy link
Member

#6877 wanted to get to that :)

@roberth roberth changed the title Make input-addressed output paths indifferent to possible CA/FOD origin of sources Make output paths/hashes indifferent to possible CA/FOD origin of sources Feb 16, 2024
@roberth roberth added the derivation design Issues to consider for new versions of the derivation format (major or incremental) label Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
derivation design Issues to consider for new versions of the derivation format (major or incremental) feature Feature request or proposal store Issues and pull requests concerning the Nix store
Projects
None yet
Development

No branches or pull requests

2 participants