Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

IPFS API via Unix sockets #129

Open
2 of 6 tasks
Kubuxu opened this issue May 17, 2016 · 17 comments
Open
2 of 6 tasks

IPFS API via Unix sockets #129

Kubuxu opened this issue May 17, 2016 · 17 comments

Comments

@Kubuxu
Copy link
Member

Kubuxu commented May 17, 2016

Unix socket have many benefits over TCP sockets. Lower CPU time and latency overhead, file system localized locations which allows for system based permissions and access control.

Also Unix sockets would be a able to use different encoding scheme from TCP (HTTP) which would be suitable for different type of applications (not browsers). This encoding should focus on being fast to encode and decode, simple to implement and also "size doesn't matter" as it is local communication.

Things that need specifying:

  • location of socket:
    • for user daemon
    • for system daemon: `/var/run/ipfs/api.socket
  • base encoding - bencode
  • protocol
    • multipart encoding - needed for streaming files in and of the API

My proposal for encoding is bencode. It is very simple to implement (under 300 C LOC) quite fast encoding. At first glance it isn't human readable but as it isn't binary encoding someone knowing the rules is able to read. It isn't space efficient but this isn't that much of a problem here.

With usage of Unix sockets in STREAM mode it should be quite easy to write protocol that is both simple and efficient. I would go with something similar to cjdns's admin API RPC model. Calls and responses are maps.

  • each call or response is a bencode map
  • each call includes txid field that is a unique integer (usually just increasing 64bit integer) , this integer is include in response - it is used for asynchronous call <-> response tracking.
  • each call includes query field that specifies what query this call uses
  • calls can include args table of strings which are consecutive arguments
  • calls can include opts map of string->string which are arguments for the commands
  • if call fails response includes error map with two fields: type and msg.
  • the result of a call is the response packet itself

Why:
We need higher performance API for example to be able to extract ipfs FUSE mount into separate process. It solves many problems but the HTTP api would be major overhead.

If anyone has some ideas, comments, please voice them. If you like the idea in current state please show it by voting 👍.

I would love to hear opinions also from JS part of a project.

@Kubuxu Kubuxu changed the title IPFS API via Unix socketsz IPFS API via Unix sockets May 17, 2016
@Kubuxu
Copy link
Member Author

Kubuxu commented May 20, 2016

This will be more complex as we need multipart type channels but can be solved. I will work some more on the spec.

@hackergrrl
Copy link

Heya! This is a neat idea, and I'm interested in hearing more: adding another API is a lot of work, so it'd be great to really flesh out the "Why" section to help us understand your thinking as clearly as possible. Here are some Qs from your current write-up:

Why:
We need higher performance API for example to be able to extract ipfs FUSE mount into separate process. It solves many problems but the HTTP api would be mayor overhead.

  1. What does higher performance mean, quantified?
  2. Why do unix sockets make extracting IPFS FUSE into a separate process easier? Why is this desirable?
  3. What are the "many problems" that it solves?
  4. Can you quantify how much overhead "major overhead" is?

@Kubuxu
Copy link
Member Author

Kubuxu commented May 20, 2016

  1. I can't find any quantitative data and simple benchmarks are meaningless (as cost of TCP lies in big part at the start of the connection, for stabilized connections I get about 900MiB/s vs 750MiB/s on my machine but it might depend on many factors) in most case here is an explanation from BSD list: http://lists.freebsd.org/pipermail/freebsd-performance/2005-February/001143.html
    Also TCP uses ramp up throttling what means that it starts slow and then increases speed as it sees that packets are not being dropped.
    What should be also compared is that new protocol would allow for asynchronous calls wouldn't depend on keep-alive for reduced command latency.
  2. We are currently afraid that HTTP the performance of FUSE will be even lower than it is now. I don't know how well (and if at all) go is able to perform for example HTTP API calls with keep-alive.
  3. FUSE currently is part of go-ipfs that: doesn't work on all systems requiring conditional builds, is unstable (as most things using FUSE) causing whole daemon to crash, can create zombie processes locking up resources.
  4. Major overhead is few milliseconds TCP and HTTP need for hand shaking, negotiations and so on. In case of few 2KiB size file (which is already cached in RAM) and full local transfer speed of 500MiB/s those few milliseconds (let's say 3) would reduce transfer speed to 280MiB/s and that is not including TCP ramp up.

Other neat feature of UNIX Socket based API would be lack of port binding conflicts among different users and also possibility of file system level access control.

I understand that API redesign isn't small task, but there is need for that.
Current API was written with CLI in mind and speced out bottom-up (API first, specs later). It doesn't fit either Remote Procedure Call model nor the Resource based model (RESTful) but those two models are most commonly used and easiest understood.

This created API that work but isn't great to use for other perspectives than the CLI applications. If we were to do full API redesign I would go with two levers of the API:

  • low level - UNIX Socket (pipe in case of Windows) API - useful for low level languages that prefer to use pipes and sockets over HTTP or require top notch performance
  • higher level - RESTful HTTP API for ease of prototyping and use in browsers and other high level applications

The HTTP API could be even built entirely on the lower level API which corresponds to something I talked with @lgierth - extracting and restructuring the HTTP Gateway.

The important part would be that we can now apply top-down approach and first design APIs with use cases in mind and then implement them as we see beast. This would give outcome of much better structured, uniform and easier to implement (as in other language bindings) interface for accessing the IPFS world.

sorry for the wall of text

@hackergrrl
Copy link

Thanks for the additional info. Is there a specific motivating problem that this aims to solve (like, an issue or someone with a case where unix sockets are a very explicitly clear win)?

@Kubuxu
Copy link
Member Author

Kubuxu commented May 20, 2016

Yes, current API is aimed at high level applications (not really but it doesn't make any difference in that case) which makes interfacing lower level applications with IPFS really hard.

To communicate with IPFS daemon a C application would have to use libcurl (or similar) which is already quite complex, but also you have to parse JSON or XML which requires separate library on its own.

Also IPFS due to its CLI based API doesn't provide constructs that are known in low level world. There is no socket you can just read data off, no simple way to seek in a binary file stored in IPFS. Of course c-ipfs-api with API in current state could happen but every library is big quite a responsibility in world of C, and that bindings wouldn't suit that would

Unix sockets are clear win in case of multi user systems, but isn't about just the transport but mostly encoding, protocol and possibly the API itself.

@hackergrrl
Copy link

Points all taken and understood. I'm still not sure we're on the same page, so let me try to rephrase: "is there a specific person or project or effort that is blocked or hindered by the lack of this?"

If so, maybe it makes more sense to start the discussion from a place of "how do we solve problem X" rather than "how do we implement Y"? (Maybe this discussion/context already happened on IRC or elsewhere on GH and I missed it?)

@Kubuxu
Copy link
Member Author

Kubuxu commented May 20, 2016

I would really like to extract FUSE out of the core go-ipfs (and maybe start a trend).
This task requires very specific API to keep everything up to performance (and possibly increasing it). Issues: ipfs/kubo#2712 ipfs/kubo#2166 and more. There is no separate issue for extracting FUSE as I think most of the talk about it happened over IRC.

@hackergrrl
Copy link

Awesome! Yes: getting FUSE out of core sounds really nice. :)

What do you think about getting something working first (a proof of concept) using e.g. the existing HTTP API? Or heck, maybe even HTTP over unix sockets? (you can ignore the cost of TCP connection management in this case, but still reuse all of the API that exists today) You've made it clear that unix sockets would be faster, but the easiest win here sounds like just the separation step.

@whyrusleeping
Copy link
Member

relevant go-ipfs issue: ipfs/kubo#2148

@kevina
Copy link

kevina commented May 20, 2016

I agree with @noffle in that we should first try getting something working using the HTTP API. With proper caching I don't think the performance will suck as badly as some fear. Once we have something basic working we can consider optimizing it with a better API. This will also allow us to perform benchmarking and really see how much of an impact the API has.

I have some experience writing a fuse filesystem in C++ and should be able to figure out how to write one in Go. This is something I might be willing to take on if no one else does.

@whyrusleeping
Copy link
Member

the fuse code is already written (and works, for the most part under normal circumstances) in the go-ipfs codebase, the only thing we would have to do is move the way it accesses data from being directly connected to a core.IpfsNode towards using the http api. This would be really awesome to have.

@whyrusleeping
Copy link
Member

whyrusleeping commented May 20, 2016

Actually, a really awesome way to do this easily would be to tweak the mfs code to use either the http api or the core node. Improving this interface: https://github.com/whyrusleeping/fallback-ipfs-shell/blob/master/shell.go (and surrounding codebase, that repo is really sad) would be the right way to go.

The advantage of making that change in mfs is that we don't have to make many changes to the fuse code (it primarily uses mfs) to get things working, and any improvements to mfs affect the rest of the system too (ipfs add uses mfs under the hood)

@kevincox
Copy link

Why do we want a separate protocol for the socket API? I think that it will just make it much harder to implement clients and will mean that most tools won't support it. I think it would be better to use the HTTP protocol over a unix socket. IIUC this will give basically identical performance.

I think a HTTP/2 (unencrypted) API bound to a unix socket would be very performant. This also means that we don't have to duplicate efforts. For example changing the response encoding from JSON to something more efficient could be used by both UNIX and TCP+HTTP clients.

That being said the reason I would like to see this is for the access control. For example I have a multi-tenant system and don't want to expose IPFS to everyone. If it could be bound to a unix socket I can adjust the socket permissions so that only certain users can connect (for example a reverse proxy which can do arbitrarily complex authentication).

Furthermore in a mutli-tennant system you risk another process binding the port since unless I am running IPFS as root I need to pick an unprivileged port. This means that I can't trust that I am actually talking to the IPFS API.

Another related benefit is avoiding port collisions as previously mentioned.

@hsanjuan
Copy link
Contributor

I think that it will just make it much harder to implement clients and will mean that most tools won't support it. I think it would be better to use the HTTP protocol over a unix socket

AFAIK you can already configure the normal HTTP API with a unix-socket listener and things work as you expect.

@kevincox
Copy link

kevincox commented Mar 19, 2020 via email

@hsanjuan
Copy link
Contributor

It's documented here: https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#addressesapi

@Stebalien
Copy link
Member

*in go-ipfs master.

Note 1: this issue is probably poorly titled. The main goal is to have a more efficient RPC protocol.
Note 2: We can do things with unix sockets that we can't do with, e.g., HTTP2. For example, we can share memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants