Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

roberth · 2023-10-31T10:08:58Z

Is your feature request related to a problem? Please describe.

We can't currently swap out an inputSrcs derivation input for an inputDrvs input derivation input and get the same output path, despite no observable differences (with the possible exception of introspective features such as exportReferencesGraph, which we could force to be mutually exclusive with this improved hashing feature).

nix repl session illustrating how it currently works, and what this improvement entails

Currently we have some equivalence between FOD output paths and sources:

nix-repl> pkgs.emptyFile.outPath
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

nix-repl> "${./empty-file}"
"/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file"

However, as expected, the string contexts are different:

nix-repl> :p builtins.getContext pkgs.emptyFile.outPath
{ "/nix/store/jnghgcdkdc79p7qp4vfc27ryvnnm5igi-empty-file.drv" = { outputs = [ "out" ]; }; }

nix-repl> :p builtins.getContext "${./empty-file}"                       
{ "/nix/store/ij3gw72f4n5z4dz6nnzl1731p9kmjbwr-empty-file" = { path = true; }; }

In itself, this is not a problem. Fact is that they need to be built differently. One involves an FOD. The other was added to the store during instantiation.

That being the case, we can expect dependents to have different derivation hashes, representing the different build graph.

nix-repl> pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]
«derivation /nix/store/nmzk7crah5himm4h3nbh6ry8331c99nq-derived.drv»

nix-repl> pkgs.concatText "derived" [ "${./empty-file}" ]      
«derivation /nix/store/6bs6pkf3z4gfmjk0ssr7qfldrvyr8qfy-derived.drv»

So far so good, but now we observe a difference in output paths, which is not necessary.

nix-repl> "${pkgs.concatText "derived" [ pkgs.emptyFile.outPath ]}"   
"/nix/store/4q9z73ak24fl3qr4j5wwkjbw0l7ch0rh-derived"

nix-repl> "${pkgs.concatText "derived" [ "${./empty-file}" ]}"      
"/nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived"

By implementing this proposal (and enabling it in Nixpkgs), both output paths would have the value /nix/store/56phdxcpdv7d8zz4cj0a28iwz5akcpcz-derived.

Describe the solution you'd like

Instantiation works the same, except for the hashing of certain input-addressed outputs.

Specifically, when a derivation attribute flag such as __derivationAgnosticOutputHashing is enabled, inputs from FODs are ignored for the purpose of hashing; instead interpreting their output paths as inputSrcs.

Ignore __derivationAgnosticOutputHashing when the build could perceive the difference, e.g. when it has flags that would let it introspect its dependencies. I don't think it could use exportReferencesGraph on itself, so this may not even be a problem. I just haven't checked whether that was everything that needs to be controlled. TBD.

Describe alternatives you've considered

Fail instead of ignoring __derivationAgnosticOutputHashing. In practice this would mean that mkDerivation has to be clever about when to set the flag, which seems rather convoluted, pointless, and ever so slightly slower.

Additional context

This would be good to do before Builtin fetching should be representable by derivations #9077
Another output hashing change is Have our cake and eat it too derivation metadata #10780

Priorities

Add 👍 to issues you find important.

The text was updated successfully, but these errors were encountered:

Ericson2314 · 2023-11-01T19:58:40Z

Yeah I've wanted this for a bit. Note that it can/should apply to all content-addressed outputs too; that is also non-fixed ones.

roberth · 2023-12-07T19:43:59Z

The logic should also apply to CA realisation (.doi) lookups, which are currently just drvHash based; not only input addressed derivation outputs seem to suffer from the discrepancy.

Ericson2314 · 2023-12-08T04:17:27Z

@roberth I've also been thinking about whether we can put the caching of this stuff in the derivation itself. Right now needing to recompute everything from the roots is annoying/slow. It would be cool if you could resume it from any derivation. (Of course, for verifying the store it would be good to still retain the ability to check it from scratch.)

Ericson2314 · 2023-12-10T16:49:03Z

nix/src/libstore/daemon.cc

Lines 583 to 614 in c8458bd

    
                   /* Content-addressed derivations are trustless because their output paths 
        
                      are verified by their content alone, so any derivation is free to 
        
                      try to produce such a path. 
        
                      Input-addressed derivation output paths, however, are calculated 
        
                      from the derivation closure that produced them---even knowing the 
        
                      root derivation is not enough. That the output data actually came 
        
                      from those derivations is fundamentally unverifiable, but the daemon 
        
                      trusts itself on that matter. The question instead is whether the 
        
                      submitted plan has rights to the output paths it wants to fill, and 
        
                      at least the derivation closure proves that. 
        
                      It would have been nice if input-address algorithm merely depended 
        
                      on the build time closure, rather than depending on the derivation 
        
                      closure. That would mean input-addressed paths used at build time 
        
                      would just be trusted and not need their own evidence. This is in 
        
                      fact fine as the same guarantees would hold *inductively*: either 
        
                      the remote builder has those paths and already trusts them, or it 
        
                      needs to build them too and thus their evidence must be provided in 
        
                      turn.  The advantage of this variant algorithm is that the evidence 
        
                      for input-addressed paths which the remote builder already has 
        
                      doesn't need to be sent again. 
        
                      That said, now that we have floating CA derivations, it is better 
        
                      that people just migrate to those which also solve this problem, and 
        
                      others. It's the same migration difficulty with strictly more 
        
                      benefit. 
        
                      Lastly, do note that when we parse fixed-output content-addressed 
        
                      derivations, we throw out the precomputed output paths and just 
        
                      store the hashes, so there aren't two competing sources of truth an 
        
                      attacker could exploit. */

oh here is another angle I forgot.

If we make that inputSrcs vs inputDrvs never influences the output path then we get:

Even more preservation of input addresses across rewrites which don't matter
Sending input-addressed derivations to another machine no longer requires trust / no longer has attack vector of writing to arbitrary output paths. (However sending over input-addressed data is still risky, so this only helps with "shallow" input-addressing where the remote already has/trust any input that isn't content addressed.)
Also addresses the caching issue above, because we only need to look up the output paths of our immediate deps.

The algorithm is very simple (works for the dynamic derivations definition too):

Inputs are conceptually Set DerivingPath
Whenever we have a DerivingPath::Output { drvPath, outputName } we can statically/trustlessly evaluate to a DerivingPath::Constant, do so. (Anywhere nested, for dynamic derivations case)
Once no more such static/trustless rewrites are possible, calculate the output path on the resulting derivation.

roberth · 2023-12-13T18:18:42Z

Probably best to start with

Explanation of how output hashes are derived #9189

That will give us the language to describe the new hashing scheme more effectively.

Ericson2314 · 2023-12-13T18:41:56Z

#6877 wanted to get to that :)

roberth added feature Feature request or proposal store Issues and pull requests concerning the Nix store labels Oct 31, 2023

roberth changed the title ~~Make input-addressed output paths indifferent to possible FOD origin of sources~~ Make input-addressed output paths indifferent to possible CA/FOD origin of sources Nov 28, 2023

roberth mentioned this issue Nov 28, 2023

Function for transforming store path contents NixOS/nixpkgs#264541

Open

This was referenced Jan 21, 2024

Track test derivations and parallelize building and testing #7662

Open

Derivation JSON env does not adhere to the JSON guidelines #9866

Open

This was referenced Feb 6, 2024

Two misc ideas for Nix itself NixOS/GSoC#13

Merged

Include store path exact spec in the docs #9295

Merged

roberth changed the title ~~Make input-addressed output paths indifferent to possible CA/FOD origin of sources~~ Make output paths/hashes indifferent to possible CA/FOD origin of sources Feb 16, 2024

roberth mentioned this issue Feb 16, 2024

Builtin fetching should be representable by derivations #9077

Open

roberth mentioned this issue May 26, 2024

Have our cake and eat it too derivation metadata #10780

Open

roberth added the derivation design Issues to consider for new versions of the derivation format (major or incremental) label Jun 9, 2024

roberth mentioned this issue Aug 8, 2024

Don't copy nix store path to nix store #11229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

roberth commented Oct 31, 2023 •

edited

Loading

Ericson2314 commented Nov 1, 2023

roberth commented Dec 7, 2023

Ericson2314 commented Dec 8, 2023 •

edited

Loading

Ericson2314 commented Dec 10, 2023

roberth commented Dec 13, 2023

Ericson2314 commented Dec 13, 2023

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

Make output paths/hashes indifferent to possible CA/FOD origin of sources #9259

Comments

roberth commented Oct 31, 2023 • edited Loading

Ericson2314 commented Nov 1, 2023

roberth commented Dec 7, 2023

Ericson2314 commented Dec 8, 2023 • edited Loading

Ericson2314 commented Dec 10, 2023

roberth commented Dec 13, 2023

Ericson2314 commented Dec 13, 2023

roberth commented Oct 31, 2023 •

edited

Loading

Ericson2314 commented Dec 8, 2023 •

edited

Loading