Skip to content

Commit

Permalink
Enable Linux read-ahead for a single page on ZVOLs
Browse files Browse the repository at this point in the history
Linux has read-ahead logic designed to accelerate sequential workloads.
ZFS has its own read-ahead logic called zprefetch that operates on both
ZVOLs and datasets. Having two prefetchers active at the same time can
cause overprefetching, which unnecessarily reduces IOPS performance on
CoW filesystems like ZFS.

Testing shows that entirely disabling the Linux prefetch results in
a significant performance penalty for reads while commensurate benefits
are seen in random writes. It appears that read-ahead benefits are
inversely proportional to random write benefits, and so a single page
of Linux-layer read-ahead appears to offer the middle ground for both
workloads.

Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #5902
  • Loading branch information
ryao authored and behlendorf committed May 4, 2017
1 parent 5731140 commit bc17f10
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 0 deletions.
20 changes: 20 additions & 0 deletions config/kernel-blk-queue-bdi.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
dnl #
dnl # 2.6.32 - 4.11, statically allocated bdi in request_queue
dnl # 4.12 - x.y, dynamically allocated bdi in request_queue
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BLK_QUEUE_BDI], [
AC_MSG_CHECKING([whether blk_queue bdi is dynamic])
ZFS_LINUX_TRY_COMPILE([
#include <linux/blkdev.h>
],[
struct request_queue q;
struct backing_dev_info bdi;
q.backing_dev_info = &bdi;
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_QUEUE_BDI_DYNAMIC, 1,
[blk queue backing_dev_info is dynamic])
],[
AC_MSG_RESULT(no)
])
])
1 change: 1 addition & 0 deletions config/kernel.m4
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_BIO_END_IO_T_ARGS
ZFS_AC_KERNEL_BIO_RW_BARRIER
ZFS_AC_KERNEL_BIO_RW_DISCARD
ZFS_AC_KERNEL_BLK_QUEUE_BDI
ZFS_AC_KERNEL_BLK_QUEUE_FLUSH
ZFS_AC_KERNEL_BLK_QUEUE_MAX_HW_SECTORS
ZFS_AC_KERNEL_BLK_QUEUE_MAX_SEGMENTS
Expand Down
11 changes: 11 additions & 0 deletions include/linux/blkdev_compat.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

#include <linux/blkdev.h>
#include <linux/elevator.h>
#include <linux/backing-dev.h>

#ifndef HAVE_FMODE_T
typedef unsigned __bitwise__ fmode_t;
Expand Down Expand Up @@ -128,6 +129,16 @@ __blk_queue_max_segments(struct request_queue *q, unsigned short max_segments)
}
#endif

static inline void
blk_queue_set_read_ahead(struct request_queue *q, unsigned long ra_pages)
{
#ifdef HAVE_BLK_QUEUE_BDI_DYNAMIC
q->backing_dev_info->ra_pages = ra_pages;
#else
q->backing_dev_info.ra_pages = ra_pages;
#endif
}

#ifndef HAVE_GET_DISK_RO
static inline int
get_disk_ro(struct gendisk *disk)
Expand Down
3 changes: 3 additions & 0 deletions module/zfs/zvol.c
Original file line number Diff line number Diff line change
Expand Up @@ -1468,6 +1468,9 @@ zvol_alloc(dev_t dev, const char *name)
blk_queue_make_request(zv->zv_queue, zvol_request);
blk_queue_set_write_cache(zv->zv_queue, B_TRUE, B_TRUE);

/* Limit read-ahead to a single page to prevent over-prefetching. */
blk_queue_set_read_ahead(zv->zv_queue, 1);

/* Disable write merging in favor of the ZIO pipeline. */
queue_flag_set(QUEUE_FLAG_NOMERGES, zv->zv_queue);

Expand Down

0 comments on commit bc17f10

Please sign in to comment.