Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy one large file from ipfs using ipfs-fuse api has severe performance issue #4228

Closed
wpfnihao opened this issue Sep 14, 2017 · 5 comments · Fixed by #8388
Closed

copy one large file from ipfs using ipfs-fuse api has severe performance issue #4228

wpfnihao opened this issue Sep 14, 2017 · 5 comments · Fixed by #8388
Labels
topic/fuse Topic fuse topic/perf Performance

Comments

@wpfnihao
Copy link

Version information:

go-ipfs version: 0.4.11-dev-c14a995
Repo version: 6
System version: amd64/linux
Golang version: go1.8.3

and

go-ipfs version: 0.4.10-
Repo version: 5
System version: amd64/linux
Golang version: go1.8.3

Type:

Medium

Severity:

High

Description:

I have done the following experiment using both the latest stable version (0.4.10) and the latest master branch :

  1. add one 5GB file on node a
  2. mount ipfs on node b
  3. run cp <ipfs-mount-point>/<multi-hash> <path> on node b

I was expecting the behavior of this scenario might be similar to ipfs get. Unfortunately, IPFS+FUSE ran extremely slow. I had to abort the task after 10mins of waiting. Please see the figures below:

Performance of the Sender (node a)
Image of Node a

Performance of the Receiver (node b)
Image of Node a

@whyrusleeping
Copy link
Member

Thanks for reporting! This is likely because of the way fuse calls are made, I believe they actually create a new dagreader per read call, lots of room for optimization here.

@dcposch
Copy link

dcposch commented Jan 20, 2018

just to add to this:

i'm seeing not just slow progress, but zero progress copying a large directory from IPFS via Fuse

go-ipfs version:

$ ipfs --version
ipfs version 0.4.13

loading a single file over IPFS/FUSE works

ipfs daemon --mount

in another terminal,

$ cp /ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/index.html .
$ head index.html
<html><head>
    <meta charset="UTF-8">
    <title>Main Page</title>
...

loading lots of files doesn't work

here i'm trying to copy out what should be about ~50GB of Wikipedia data from IPFS

( see https://ipfs.io/blog/24-uncensorable-wikipedia/ )

$ cp -r /ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki .

this hangs forever (i left it running for a day, and the output directory remains essentially empty:)

$ find /mnt/disk/wiki-ipfs/
/mnt/disk/wiki-ipfs/
/mnt/disk/wiki-ipfs/wiki
/mnt/disk/wiki-ipfs/index.html

the ipfs daemon prints occasional errors

image

the ipfs daemon uses 1 full core continuously and ~1GB ram, but makes no progress

@ZerxXxes
Copy link

I also have a lot of trouble with this, all reads from FUSE mounts are superslow.
It works if the file is a few kilobytes but as soon as you try to read a file of 1MB you start to notice its slow, the larger the file the slower it reads from FUSE. 100MB+ files takes several minutes to cat but using ipfs cat can read the same data in less then a second so the problem is somewhere within FUSE.
I did a strace when trying to cat a file in my /ipfs-mount and for every read there seems to be a write-operation, is linux trying to write to ipfs even though it cant?

Looking at how its mounted shows these options
/dev/fuse on /ipfs type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /ipns type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
I'm no expert on FUSE but this does not look right to me.
rw says the filesystem is mounted with write-premissions, would not linux handle this better if we mounted it as read-only?
relatime says that linux will try to update the file access time, we really should mount ipfs with noatime

I did some additional digging on FUSE flags in the manpage and found a few which might be interesting to test.
kernel_cache should be nice for IPFS, we should be able to cache files in the kernel for a long time as files never ever can change.
direct_io might be useful as linux wont know the filesize before reading the data
max_readahead might also need tweaking but I'm unsure on whats suitable here

@Stebalien
Copy link
Member

We are in the process of completely rewriting our fuse integration. See #6036.

@ZerxXxes
Copy link

Thank you for the info! Great to see work is being done to improve this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/fuse Topic fuse topic/perf Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants