IPLD Patch #171

warpfork · 2022-01-18T21:11:13Z

IPLD Patch should describe a declarative document update system.

This issue is an outline of general wishes for a fairly major new specification and suite of feature. It will almost certainly need to be broken down and fleshed out further before becoming ready for work. Several paragraphs below will use words like "may" heavily: these are indeed guesses, while I hope they deserve the title "educated guess", certainly further analysis may be needed!

IPLD Patch should be reasonably user-friendly. For example, it seems likely that it should be able to use paths -- e.g. perhaps like {"op":"update", "path":"deep/in/3/data", "val": "new value"}.

JSON Patch (RFC 6902) provides some great prior art, and is probably very close to the mark, but has a few limitations I'd like to improve on:

Mainly, JSON Patch is specified for JSON! We should be able to hoist this to apply on the IPLD Data Model (so that it can apply to JSON documents, or CBOR, or DAG-PB, and beyond), and also be declarable in terms of the IPLD Data Model (so patch specs can be serialized in JSON, or CBOR, or whatever, etc). (This should be a pretty trivial hoist.)
JSON Patch is ambiguous about insertion order if adding an entry to a map. I'd suggest we should consider adding support for an optional "after" or "before" field to "add"/"replace" operations.
It may be useful to consider having a "sort" operation, which for some users might be an excellent alternative to the above!
(Other improvements may be found on further examination. Or, may be found in the future and added as extensions onto the core.)

There should be additional benefits to IPLD Patch over JSON Patch, even if they were textually equivalent:

Because IPLD Patch should apply over the Data Model, it follows that IPLD Patch should work not just over various codecs, but also over ADLs. Meaning? One should be able to use IPLD Patch to update even complex sharded data structures like HAMTs. (This would be a fairly huge feature!)
Because IPLD Patch is working over linked data, it means an IPLD Patch implementation will be capable of updating multiple IPLD data blocks at once. (We might expect this to be a very pleasing feature for the IPLD ecosystem at large: right now, doing this generally requires writing some code in the library of your choice; while certainly this works, it's not highly communicable except between people who program in the same language; and certainly it doesn't lend itself to e.g. CLI tools or other forms of non-code interactables!)

The design of IPLD Patch should consider commutability, and the implementation should document what operations it may choose to commute in order to make optimizations. If the author of a patch instruction document uses non-commutable operations, they should be able to be aware of that by having skimmed the IPLD Patch spec.

It may be useful to consider having additional features like upsert preconditions, e.g. only update this value if the existing value is X. Even the equality predicate here is enough to support construction of CAS systems, which is extremely useful. Conditions based on numerical comparison, etc, could also be considered. Any specification for such preconditions should also be careful, however, to ensure it does not open the door to nonlocal reasoning in ways that would result in unpredictable performance impacts or make systems using IPLD Patch become overly footgun-prone with respect to DoS mechanisms (e.g. a precondition based on the same value is certainly no problem; a precondition based on a distant value elsewhere in the tree not be easy to implement efficiently; the latter should be explored before adding such a system to the specification).

It may be useful to have more than one form of IPLD Patch documents: perhaps a "normal" form, which is highly recursive and takes one path segment step each, and a nonnormal form which allows whole paths to be specified in one line. The latter is likely going to be found friendlier to human authors; the former might end up friendlier to procedural authoring (and also, I suspect, bear more resemblance to an internal form that an implementation might need anyway).

A bonus goal of an implementation might be to recall block boundaries as they are traversed, and memoize both the CID and the patch subtree being applied there, so that if it finds that it's applying the same patch subtree to the same CID in the future, it can skip to the result. This could be useful in the case of e.g. some dataset which is "indexed" by multiple HAMTs pointing into the same leaf nodes via different keys. (One can see how a "normal" serial form as mentioned above could be useful here: it would become part of the memoization key.)

It may be desirable to support the application of an IPLD Patch document to positions in a large data graph which are determined by use of an IPLD Selector, as an alternative to starting at a root and applying by plain path. It is likely that this can be done from outside the scope of the IPLD Patch spec, by simple composition, but is noted here for consideration.

The text was updated successfully, but these errors were encountered:

warpfork · 2022-01-31T11:30:40Z

Thinking about how textually close this is to JSON Patch: the answer is "very". It seems that very often we're able to hoist the existing concepts and apply them to IPLD without difficulty.

For example, even taking a valid JSON Patch today, such as:

{ "op": "replace", "path": "Links/0/Hash", "value": { "/": "bafyfoo" } }

... works correctly as JSON Patch, applied on a DAG-JSON object, without a fuss.
... will also work correctly if seen as IPLD Patch that's serialized in DAG-JSON, and applied on any IPLD node.

It's just that in the former story, the patch logic thinks it's putting map at that position, whereas the latter story the patch logic will have parsed the value as a link and knows it's placing a link at that position. The former is arguably semantically "wrong", but as long as you parse the result as DAG-JSON again, it's still doing a sufficiently textually correct thing that it all works out equivalently by the end.

rvagg · 2022-05-03T22:55:15Z

closing in favour of #187 which @RangerMauve is going to try and get over the line

warpfork mentioned this issue Jan 18, 2022

APIs for creating incrementally modified nodes ipld/go-ipld-prime#320

Open

warpfork added backlog difficulty:hard labels Jan 18, 2022

warpfork mentioned this issue Jan 31, 2022

ipfs dag diff ipfs/kubo#4801

Open

rvagg closed this as completed May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPLD Patch #171

IPLD Patch #171

warpfork commented Jan 18, 2022

warpfork commented Jan 31, 2022

rvagg commented May 3, 2022

IPLD Patch #171

IPLD Patch #171

Comments

warpfork commented Jan 18, 2022

warpfork commented Jan 31, 2022

rvagg commented May 3, 2022