Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async file I/O #20

Closed
njsmith opened this issue Jan 22, 2017 · 7 comments
Closed

async file I/O #20

njsmith opened this issue Jan 22, 2017 · 7 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jan 22, 2017

Maybe a wrapper around pathlib.Path that wraps everything in run_in_worker_thread, and where open returns a file-like object whose methods have again been wrapped in run_in_worker_thread? (#10 and #6 are relevant.)

On Windows it'd be nice to go the extra step and use the IOCP methods for actual file I/O.

@buhman
Copy link
Member

buhman commented May 22, 2017

Because I've used and am very familiar with aiofiles, I thought I'd work on this. Instead of duplicating aiofiles though, I thought I'd try to be fancy and dynamically recreate the entire IOBase class hierarchy. I came up with this monstrosity, but it doesn't work because:

  • I include wrappers for non-heap classes, and the __class__ hackery on line 117 won't work for those.
  • some functions, like close() will call flush() internally, which causes problems if flush() is a coroutine function

I think I learned enough from this experiment that I can come up with something much better and actually-working.

I also agree with aiofiles that async wrapper methods should probably be generated at class creation time. Attributes, however, should be handled in __getattr__ instead of making property wrappers for each.

Other ideas:

  • after playing with that "build superclasses in order, then dictionary lookup" nonsense, I think singledispatch is really the way to go: Where there is no registered implementation for a specific type, its method resolution order is used to find a more generic implementation. 👍
  • I think it makes sense to expose the singledispatch wrapper factory as part of the public api, so things like AsyncStringIOs can easily be created, and other non-open usage.

I welcome feedback on any of this.

@buhman
Copy link
Member

buhman commented May 22, 2017

Should this be a new io package within trio, or completely separated code inside trio_io? (maybe the latter would convince me to help with #158).

@njsmith
Copy link
Member Author

njsmith commented May 27, 2017

Hey, sorry for the slow response! I'm still getting caught up again after PyCon.

In general, my preference is to avoid magic introspection, dispatch, subclassing, and similar things... I like delegation and explicit lists :-). In theory, it's extra typing and they can get out-of-sync with Python, but in practice I think it's worth it because even if it does go wrong sometimes, it's much easier to fix than complicated magic. Example: in trio's socket wrapper, here are the attributes that get delegated directly and here's one that gets wrapped.

I guess in this case the problem is that the io module is a messy tangle of subclasses? But even so, it seems like it should be pretty straightforward to keep a list of which classes we want to wrap (basically Path, TextIO, BufferedReader, BufferedWriter, BufferedRandom?), and which methods are sync and which need to be wrapped in run_in_worker_thread. What do you think?

I think this is probably a fundamental and only-one-way-to-do-it enough feature that it makes sense to put into trio itself. As public API, maybe all we need is trio.Path, trio.open_file, and trio.wrap_file?

@njsmith
Copy link
Member Author

njsmith commented Jun 5, 2017

Note for posterity:

One of the basic design questions here is whether we want to support native async I/O primitives when they're available, or whether we want to commit to using threads for everything all the time. This makes a big difference, because using native async I/O primitives requires pretty much throwing out the stdlib io module entirely, and then if we have to write something from scratch there's a question of whether to try and implement the same interfaces or else do something different.

The options are: Windows has some async file I/O options (basically just pread/pwrite, not like stat or open or anything), and Linux has some extremely limited async file I/O options that are almost never useful in practice (in particular they require disabling the buffer cache). Who knows, maybe someday it will get better but I'm not holding my breath.

Microsoft says that "async" file I/O still sometimes blocks (a 2014 comment reporting the same thing). I've heard that the same is true for Linux's native AIO (I think what happens is the actual block writing can be async, but stuff like walking the extent tree to figure out where the blocks will go is still synchronous – basically Linux's AIO is really only intended for use on raw block devices by the kind of mini-operating system that masquerades as an RDBMS).

In addition, various reports suggest that doing synchronous I/O from a thread-pool is actually faster than doing native async I/O. There's nothing magic about kernel async I/O implementations; a kernel thread and a kernel state machine are essentially the same thing, plus the synchronous paths get way more attention from maintainers. (You may have heard that internally, the Windows kernel is all async, so async operations will always be just as fast as synchronous ones. It turns out that this isn't true – they have a special fast path for synchronous I/O on ordinary files!) User space threads are a bit more expensive than kernel threads, but with an efficient thread pool it doesn't make much difference. Probably the main benefit that native kernel support could potentially provide is that the kernel can detect when the data being requested is already in RAM, and return it immediately without any kind of thread overhead at all. But in practice there aren't any actually-usable APIs that reliably work like this.

Presumably for these reasons, libuv uses a thread pool for disk I/O in all configurations, and this seems to be universally agreed to be the right solution when writing C programs.

Does trio being written in Python make a difference? Well, our thread synchronization overhead is much (much) worse than a well-tuned C thread pool. Right now it's especially silly because we actually spawn a new thread for every operation instead of caching them like a thread pool does, but then, the reason we do that is that finding a thread in the cache and scheduling a job to it is expensive enough from Python that spawning a new thread each time is nearly as good. In any case, we can certainly improve run_in_worker_thread if we have to, but between the GIL and general Python overhead (including e.g. trio's rescheduling logic) it's never going to do a single syscall dispatch as fast as a tight C implementation. Also, things like buffered read/writes are especially silly here, because we may end up going into a new thread just to read some bytes out of a userspace buffer that's right in front of us. Doh.

OTOH, the stdlib io module is all written in C (for CPython); if we reimplemented the stack ourselves to reduce the thread overhead, then our reimplementation would itself add some overhead.

In conclusion, I think going all-in on run_in_worker_thread is a reasonable idea. It's not at all clear that anything better is possible, and while it's not trivial (see all the discussion in #180), it's much simpler than defining a whole new file I/O stack from scratch. And worst case, we can eventually implement our own duck-compatible objects if we have to and the benefit is there (i.e., have wrap_file magically detect and return some special optimized object if it's given an object representing a vanilla on-disk file).

interesting reference: http://blog.libtorrent.org/2012/10/asynchronous-disk-io/

@buhman
Copy link
Member

buhman commented Jun 5, 2017

whether we want to support native async I/O primitives when they're available

I'd actually love tinkering with this, once we get the catch-all thread implementation done.

stdlib io module is all written in C
then our reimplementation would itself add some overhead

Are you against adding adding C (or any other non-pure-python) to trio?

02:41:54 arigato but do it with cffi

because we may end up going into a new thread just to read some bytes out of a userspace buffer that's right in front of us

I haven't read too deep into the implementation, but isn't the actual buffer hidden from normal python code? How would this work?

libuv uses a thread pool for disk I/O in all configurations

Would it be especially crazy to integrate with libuv, if they already do this well?

we can certainly improve run_in_worker_thread if we have to,

02:45:41 arigato or, alternatively, use a C thread pool, and keep multithreading out of Python completely

This sounds like it could be fun.

@njsmith
Copy link
Member Author

njsmith commented Jun 8, 2017

Here's another update on async I/O in Linux: https://lwn.net/Articles/724198/

Basically it confirms that the existing native async I/O routines are currently useless for our purposes.

@buhman
Copy link
Member

buhman commented Jun 13, 2017

Done in #180 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants