Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

[EPIC] Towards better file system APIs #71

Closed
djdv opened this issue Jul 9, 2019 · 4 comments
Closed

[EPIC] Towards better file system APIs #71

djdv opened this issue Jul 9, 2019 · 4 comments
Assignees

Comments

@djdv
Copy link

djdv commented Jul 9, 2019

If we wish to support file system based package managers, we'll need to accommodate a range of existing expectations, and hopefully exceed them.
In general, there should be an easy (and ideally transparent/drop-in) way to both add package repositories into IPFS, and get packages out. From the perspective of both maintainers and users.

Getting to that point will require work in multiple areas to cover multiple use cases.
Sections of the work done here: #67
will likely become relevant, as we find out what specific issues are present today, and how we may alleviate them.

For example, using data on IPFS with existing tools will certainly involve mounting IPFS and interfacing with APIs like FUSE.
Creating new tools, such as a repo syncing tool, IPFS package manager PoC, etc. on top of existing APIs, may prove challenging for certain workloads, or at the very least, contain a lot of overlap. So we should find ways to improve, or supersede these APIs.
In the cases where existing things are fit for certain tasks, it may still be hard to make them work together. So we should find better ways to interoperate.

Short term, we can collect and discuss known problem points that are relevant to package management. What exists today, and why it is/is not viable?
e.g.

  • MFS; for importing and working with new or existing datasets
    • {conceptually} deals with an isolated filesystem, using a unix-like (subset) specification
      • core implementations should try to unify namespace handling so things like ipfs files ls go get obviated by ipfs ls; (at an API level as well)
    • {go-mfs} suboptimal performance
    • {go-mfs} implementation is not well understood
  • ipfs mount; for interacting with IPFS using existing utilities
    • {go-ipfs} suboptimal performance
    • {go-ipfs} lacks write support
    • {go-ipfs} Currently only deals with 2 namespaces/APIs (IPFS, IPNS, in a rigid way)

...

Long term, we can build towards a virtual file system interface (VFS), that supersedes MFS and better integrates with other APIs that may be relevant for filesystem based operations (such as UnixfsV2).

In between, I plan to work towards creating and maintaining an experimental VFS API that provides a means of interacting with IPFS (using new methods, and our newer APIs).
As well as using this API to provide an experimental version of ipfs mount that should aid us in testing and development. Particularly, this should help nail down a common set of file system expectations that IPFS implementations provide to developers and users.

This draft is a good starting reference: ipfs/kubo#6036
recapping work that's already been done on this effort.
Recent talk around this was done at IPFS camp, notes are here: https://github.com/ipfs/camp/blob/master/DEEP_DIVES/31-mounting-an-ipfs-filesystem.md

As we proceed, these goals will have to be better divided up and defined. Expect smaller issues to pop up around this, in this repo, and linked back here as work is done on them.
(in a simillar fashion to: ipfs/kubo#5003)

@meiqimichelle meiqimichelle changed the title Towards better file system APIs [Epic] Towards better file system APIs Jul 10, 2019
@andrew
Copy link
Collaborator

andrew commented Jul 18, 2019

@djdv something that would be useful to think about: are there any parts of unixfsv2 that would be required for this work, or that would be super helpful to have?

Based on the conversation we had in standup yesterday I'm thinking not but there was discussion in the IPLD team about a unixfsv1.5 that could potentially ship much sooner if there was a real short-term need for it and it'd be good to get those needs discovered sooner rather than later.

@djdv
Copy link
Author

djdv commented Jul 18, 2019

@andrew
One scenario that comes to mind is metadata.

The UFSv1 metadata facilities are limiting (type and size only?), so the approach I have in mind is to just sidestep them entirely. Instead, storing metadata for paths in node-local storage, in a way that is particular to one of the APIs that will come out of this.
Effectively separating the local metadata from the network-global data, and using them in tendem during operations that rely on them. (most likely in some overlay-like fashion)

i.e. imagine having an atime update for path /ipfs/QmWhatever. The metadata associated with that path would be constructed|modified and flushed to the node's datastore (or elsewhere). This state could be restored later, giving you effective metadata persistence, even on paths that reference immutable data objects.
(contrived example)

Since this separation exists, it would have to be synchronized out of band in some manner, one we talk about often is rsync, but you could imagine handling this internally as well through one of our dynamic channels (pubsub, ipns, just another dag, etc.).


I've also heard people talking about the topic of storing dag creation arguments within objects somewhere.
Storing things like what chunking arguments where used, tree style, etc. Which would be useful to know. With that information decision making during writes, etc. could be more dynamic. Not using whatever is deemed standard, but using whatever makes sense given the extra data.
i.e. don't take a trickle dag and turn it into a balanced/hybrid one when appending,


I'm not sure what the state of metadata is in UFSv2.
But having standards around metadata would be useful to know, even if I plan to separate them in the short term. It would be nice if mount-specific metadata and UFSv2 metadata structures had some kind of compatibility.

Reading through various repos, I found some of these remarks interesting.

Series of quotes

ipld/legacy-unixfs-v2#1 (comment)

we can get the best of both worlds by storing this metadata in the directories but not the files. If we really need to attach the metadata to the files themselves, we could add a special metadata node that adds metadata to a file (although this would only be used when linking to files directly which probably won't be that common).

ipld/legacy-unixfs-v2#1 (comment)

I doubt the overhead of string names will be all that bad, even for tiny files. First, we can omit key/value pairs with default values. Second, CIDs themselves are generally ~36 bytes (in CBOR, including tags etc., links cost 41 bytes IIRC) so links will likely dwarf the cost of string keys. There have been grumblings (i.e., I've been grumbling) about adding compression support but that's still in the "it would be nice" stage.

ipld/legacy-unixfs-v2#1 (comment)

The power here is not the ability to generate CARs on the fly (although that's really convenient), it's the ability to map structured linked data into unixfs without losing its structure. For now, most tools will just see it as a byte stream (a CAR). However, we can give it some extra metadata marking it as an IPLD DAG so tools that understand IPLD can operate on it that way.

ipld/legacy-unixfs-v2#15 (comment)

we decided that it was critical to come up with a "format string" (fmtstr) that specifies in exact detail how to reproducibly import a file

We should probably coordinate around it. Figure out what our hard requirements are, and if they are deeply tied to UFSv2. It may be that the v2 design influences the v1 workarounds, or vice versa.

Randomly cc'ing @warpfork
^These words might make some sense out of context. If so, any input?

@warpfork
Copy link
Collaborator

I don't have a ton of brain cycles for Unixfsv2 at present, but in case it's useful, here are some links to a particular exploratory romp on the subject:

So, if anyone would like to run with that train of thought by using some schema DSLs to make concrete proposals about metadata -- and several, viably cohabitant schemas that we could pattern-match on in the wild -- that'd be interesting, and I'd try to make a point to be around to chat about it.

@meiqimichelle meiqimichelle changed the title [Epic] Towards better file system APIs [EPIC] Towards better file system APIs Aug 15, 2019
@djdv
Copy link
Author

djdv commented Jan 21, 2020

See: #74 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants