Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use the windows CopyFile API for copying files #2960

Closed
wants to merge 3 commits into from

Conversation

vbfox
Copy link
Contributor

@vbfox vbfox commented Mar 20, 2017

Summary

I create this PR more to open a discussion on what to do than to suggest this exact code. ffi is an annoying dependency but on win32 using the OS CopyFile provide a nice speedup in file copy operations versus doing it by hand.

On the same hardware as the measurements done in #990 (comment) running yarn install 10 times the results are :

Current code CopyFileW
Min 30s 19s
Avg 31s 20s
Max 33s 20s

Removed one third of the execution time (In this very specific sample).

Test plan

WIP ;-)

@Daniel15
Copy link
Member

This is an interesting idea! Does node.js not have a built-in way to invoke CopyFileW though?

@arcanis
Copy link
Member

arcanis commented Apr 10, 2017

I don't think so - there's no fs.copy in node :(

@vbfox
Copy link
Contributor Author

vbfox commented Apr 11, 2017

I searched in node source code but they don't reference CopyFile & I didn't find any high level API corresponding to it.

I'll cleanup this branch, add test & get some clean benchmarks to measure the difference exactly.

PS: My original hope was finding a way to get NTFS to do CopyOnWrite, but that's only possible with ReFS variant on 2016 server and as it's a block operation it has some constraints ( https://msdn.microsoft.com/en-us/library/windows/desktop/mt590820(v=vs.85).aspx )

@vbfox vbfox force-pushed the copyfile_windows_api branch 2 times, most recently from 0bde4ef to 80bb164 Compare April 11, 2017 22:08
@vbfox
Copy link
Contributor Author

vbfox commented Apr 11, 2017

I rebased, cleaned up a little & fixed long paths.


Did a few measurements (& Added to PR text)

On the same hardware and project as the measurements done in #990 (comment) running yarn install 10 times the results are :

Current code CopyFileW
Min 30s 19s
Avg 31s 20s
Max 33s 20s

Removed one third of the execution time (In this very specific sample).


Lot of tests are currently failing on appveyor due to stack overflows because it's not picking up the binary binding on the ref module. Don't know why i'll need to investigate.

@Daniel15
Copy link
Member

For what it's worth, I think it would be pretty cool to release the file copying code as a separate npm package so that other projects could take advantage of it too. 😃

For the test failures, try running the tests on your computer and see whether they pass.

Removed one third of the execution time (In this very specific sample).

Nice!

My original hope was finding a way to get NTFS to do CopyOnWrite,

I'm looking forward to ReFS being available on consumer versions of Windows (hopefully one day!). If you want copy-on-write today, it's available on Linux through zfs and btrfs. I suspect the code would need to either do a syscall to whatever Linux uses for copying files or shell out to /bin/cp to take advantage of copy-on-write on Linux though.

@vbfox
Copy link
Contributor Author

vbfox commented Apr 27, 2017

I rebased the PR & see 2 problems with the tests here :

  1. nan doesn't build on travis build machines because there is no c++ 11 compiler (but it's there on circleci so it works)
  2. Tests like integration.js that call jest.resetModules(); end up requiring the ref package twice but due to the way this package is written it's not possible to do that. I didn't find any way to instruct jest to just use the real require for a module instead of it's internal infrastructure so I sent a PR to the module author to mitigate the problem ( Avoid problems when required twice TooTallNate/ref#76 )

@arcanis
Copy link
Member

arcanis commented Apr 28, 2017

Now that I think about it, wouldn't merging this PR mean that we wouldn't be able to ship Yarn as a single js file anymore? Since ffi is a native module :/

@vbfox
Copy link
Contributor Author

vbfox commented Apr 28, 2017

@arcanis I didn't know that it was distributed as such but I think it can be detected with a fallback to the nodejs code if the native module isn't there. As there is no feature change it would just be a little slower on windows in single-js form

How can I generate a single-js version to test it ?

* Exclude 'ffi' and 'ref-wchar' from webpack builds
* Handle their possible absence in the code
@vbfox
Copy link
Contributor Author

vbfox commented Apr 28, 2017

  • I fixed the single file build by not including the native module & handing the case in code
  • Also fixed the linux build that were failing due to a too old c++ compiler version by upgrading travis to Ubuntu trusty
  • No maintainer seem to have looked at Avoid problems when required twice TooTallNate/ref#76 so I don't know what to do there

@vbfox
Copy link
Contributor Author

vbfox commented Apr 28, 2017

The current gains on my PC :

  • material2 on SDD 29.46s -> 17.93s (39% faster)
  • material2 on an old HDD 70.97s -> 63.72s (10% faster)
  • yarn/master on SSD 7.15s -> 4.65s (35% faster)
  • yarn/master on an old HDD 16.18s -> 12.15s (24% faster)

The basline tests were done from master so they include @sciolist #3234 fix

@ntkoopman
Copy link

@Daniel15 I just tried COW on btrfs and got a pretty large speedup:
material2 went from 74s to 34s (54% faster) on an SSD. Implementation is pretty simple

const ioctl = require('ioctl');
const FICLONE = 0x40049409; // _IOW (0x94, 9, int)
const source_fd = fs.openSync(data.src, 'r'); // sync for readability only
const dest_fs = fs.openSync(data.dest, 'w');
ioctl(dest_fd, FICLONE, source_fd);
fs.closeSync(dest_fd);
fs.closeSync(source_fd);

Problem is this magic constant requires the kernel headers to be determined reliably, so for Linux support splitting this functionality into a separate (native) package would really help.

@sciolist
Copy link
Contributor

sciolist commented Apr 30, 2017

You could also try using libuv's sendfile, which copies files about 70% faster than the WriteStreams in yarn on platforms where sendfile works for file copy (linux and osx, some other unixes).. Otherwise it falls back to a more naive read+write loop that runs about 20% faster than yarns current copying.

Unfortunately sendfile was removed from node's fs a few years ago, because it was a bit broken at the time. on my reasonably large test project sendfile improves initial install times by about 50% on linux/osx, 10-15% on windows.

Sounds like making a general 'copyfile' library that could optimize better for each platform wouldn't be a bad idea.

@bestander
Copy link
Member

So, what is the next step?
I love the change, thanks for doing it.

@Daniel15
Copy link
Member

Sounds like making a general 'copyfile' library that could optimize better for each platform wouldn't be a bad idea.

This is a great idea. @vbfox what do you think about putting the ffi stuff into your own separate package (maybe called "fast-file-copy" or something like that) and then just including that as a dependency in Yarn? I imagine it's very useful outside of Yarn and could be used separately by other projects 😃

@vbfox
Copy link
Contributor Author

vbfox commented May 12, 2017

@Daniel15 Well seeing that there was not much evolution on my PR to ref I started doing a native module 😄

It's working quite well, i'll check if having native direct win32 code has any advantage vs @sciolist implementation and if yes i'll PR a Win32 variant to his repo

@Daniel15
Copy link
Member

Nice! Do you think we should close this PR then, as it likely won't be merged in its current state? It'd be good to have one single "fast copy ALL THE THINGS" module that we can use.

@vbfox
Copy link
Contributor Author

vbfox commented May 12, 2017

Yup closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants