Skip to content

Commit

Permalink
fs: introduce write_begin, write_end, and perform_write aops
Browse files Browse the repository at this point in the history
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).

[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
Nick Piggin authored and Linus Torvalds committed Oct 16, 2007
1 parent 637aff4 commit afddba4
Show file tree
Hide file tree
Showing 11 changed files with 575 additions and 206 deletions.
9 changes: 6 additions & 3 deletions Documentation/filesystems/Locking
Original file line number Diff line number Diff line change
Expand Up @@ -178,15 +178,18 @@ prototypes:
locking rules:
All except set_page_dirty may block

BKL PageLocked(page)
BKL PageLocked(page) i_sem
writepage: no yes, unlocks (see below)
readpage: no yes, unlocks
sync_page: no maybe
writepages: no
set_page_dirty no no
readpages: no
prepare_write: no yes
commit_write: no yes
prepare_write: no yes yes
commit_write: no yes yes
write_begin: no locks the page yes
write_end: no yes, unlocks yes
perform_write: no n/a yes
bmap: yes
invalidatepage: no yes
releasepage: no yes
Expand Down
45 changes: 45 additions & 0 deletions Documentation/filesystems/vfs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -537,6 +537,12 @@ struct address_space_operations {
struct list_head *pages, unsigned nr_pages);
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
int (*invalidatepage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
Expand Down Expand Up @@ -633,6 +639,45 @@ struct address_space_operations {
operations. It should avoid returning an error if possible -
errors should have been handled by prepare_write.

write_begin: This is intended as a replacement for prepare_write. The
key differences being that:
- it returns a locked page (in *pagep) rather than being
given a pre locked page;
- it must be able to cope with short writes (where the
length passed to write_begin is greater than the number
of bytes copied into the page).

Called by the generic buffered write code to ask the filesystem to
prepare to write len bytes at the given offset in the file. The
address_space should check that the write will be able to complete,
by allocating space if necessary and doing any other internal
housekeeping. If the write will update parts of any basic-blocks on
storage, then those blocks should be pre-read (if they haven't been
read already) so that the updated blocks can be written out properly.

The filesystem must return the locked pagecache page for the specified
offset, in *pagep, for the caller to write into.

flags is a field for AOP_FLAG_xxx flags, described in
include/linux/fs.h.

A void * may be returned in fsdata, which then gets passed into
write_end.

Returns 0 on success; < 0 on failure (which is the error code), in
which case write_end is not called.

write_end: After a successful write_begin, and data copy, write_end must
be called. len is the original len passed to write_begin, and copied
is the amount that was able to be copied (copied == len is always true
if write_begin was called with the AOP_FLAG_UNINTERRUPTIBLE flag).

The filesystem must take care of unlocking the page and releasing it
refcount, and updating i_size.

Returns < 0 on failure, otherwise the number of bytes (<= 'copied')
that were able to be copied into pagecache.

bmap: called by the VFS to map a logical block offset within object to
physical block number. This method is used by the FIBMAP
ioctl and for working with swap-files. To be able to swap to
Expand Down
75 changes: 29 additions & 46 deletions drivers/block/loop.c
Original file line number Diff line number Diff line change
Expand Up @@ -204,14 +204,13 @@ lo_do_transfer(struct loop_device *lo, int cmd,
* do_lo_send_aops - helper for writing data to a loop device
*
* This is the fast version for backing filesystems which implement the address
* space operations prepare_write and commit_write.
* space operations write_begin and write_end.
*/
static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
int bsize, loff_t pos, struct page *page)
int bsize, loff_t pos, struct page *unused)
{
struct file *file = lo->lo_backing_file; /* kudos to NFsckingS */
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *aops = mapping->a_ops;
pgoff_t index;
unsigned offset, bv_offs;
int len, ret;
Expand All @@ -223,63 +222,47 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
len = bvec->bv_len;
while (len > 0) {
sector_t IV;
unsigned size;
unsigned size, copied;
int transfer_result;
struct page *page;
void *fsdata;

IV = ((sector_t)index << (PAGE_CACHE_SHIFT - 9))+(offset >> 9);
size = PAGE_CACHE_SIZE - offset;
if (size > len)
size = len;
page = grab_cache_page(mapping, index);
if (unlikely(!page))

ret = pagecache_write_begin(file, mapping, pos, size, 0,
&page, &fsdata);
if (ret)
goto fail;
ret = aops->prepare_write(file, page, offset,
offset + size);
if (unlikely(ret)) {
if (ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
continue;
}
goto unlock;
}

transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
/*
* The transfer failed, but we still write the data to
* keep prepare/commit calls balanced.
*/
printk(KERN_ERR "loop: transfer error block %llu\n",
(unsigned long long)index);
zero_user_page(page, offset, size, KM_USER0);
}
flush_dcache_page(page);
ret = aops->commit_write(file, page, offset,
offset + size);
if (unlikely(ret)) {
if (ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
continue;
}
goto unlock;
}
copied = size;
if (unlikely(transfer_result))
goto unlock;
bv_offs += size;
len -= size;
copied = 0;

ret = pagecache_write_end(file, mapping, pos, size, copied,
page, fsdata);
if (ret < 0)
goto fail;
if (ret < copied)
copied = ret;

if (unlikely(transfer_result))
goto fail;

bv_offs += copied;
len -= copied;
offset = 0;
index++;
pos += size;
unlock_page(page);
page_cache_release(page);
pos += copied;
}
ret = 0;
out:
mutex_unlock(&mapping->host->i_mutex);
return ret;
unlock:
unlock_page(page);
page_cache_release(page);
fail:
ret = -1;
goto out;
Expand Down Expand Up @@ -313,7 +296,7 @@ static int __do_lo_send_write(struct file *file,
* do_lo_send_direct_write - helper for writing data to a loop device
*
* This is the fast, non-transforming version for backing filesystems which do
* not implement the address space operations prepare_write and commit_write.
* not implement the address space operations write_begin and write_end.
* It uses the write file operation which should be present on all writeable
* filesystems.
*/
Expand All @@ -332,7 +315,7 @@ static int do_lo_send_direct_write(struct loop_device *lo,
* do_lo_send_write - helper for writing data to a loop device
*
* This is the slow, transforming version for filesystems which do not
* implement the address space operations prepare_write and commit_write. It
* implement the address space operations write_begin and write_end. It
* uses the write file operation which should be present on all writeable
* filesystems.
*
Expand Down Expand Up @@ -780,7 +763,7 @@ static int loop_set_fd(struct loop_device *lo, struct file *lo_file,
*/
if (!file->f_op->splice_read)
goto out_putf;
if (aops->prepare_write && aops->commit_write)
if (aops->prepare_write || aops->write_begin)
lo_flags |= LO_FLAGS_USE_AOPS;
if (!(lo_flags & LO_FLAGS_USE_AOPS) && !file->f_op->write)
lo_flags |= LO_FLAGS_READ_ONLY;
Expand Down
Loading

0 comments on commit afddba4

Please sign in to comment.