Skip to content

Commit

Permalink
FreeBSD r256956: Improve ZFS N-way mirror read performance by using l…
Browse files Browse the repository at this point in the history
…oad and locality information.

The existing algorithm selects a preferred leaf vdev based on offset of the zio
request modulo the number of members in the mirror. It assumes the devices are
of equal performance and that spreading the requests randomly over both drives
will be sufficient to saturate them. In practice this results in the leaf vdevs
being under utilized.

The new algorithm takes into the following additional factors:
* Load of the vdevs (number outstanding I/O requests)
* The locality of last queued I/O vs the new I/O request.

Within the locality calculation additional knowledge about the underlying vdev
is considered such as; is the device backing the vdev a rotating media device.

This results in performance increases across the board as well as significant
increases for predominantly streaming loads and for configurations which don't
have evenly performing devices.

The following are results from a setup with 3 Way Mirror with 2 x HD's and
1 x SSD from a basic test running multiple parrallel dd's.

With pre-fetch disabled (vfs.zfs.prefetch_disable=1):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s

With pre-fetch enabled (vfs.zfs.prefetch_disable=0):

== Stripe Balanced (default) ==
Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s
== Load Balanced (zfslinux) ==
Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s
== Load Balanced (locality freebsd) ==
Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s

In addition to the performance changes the code was also restructured, with
the help of Justin Gibbs, to provide a more logical flow which also ensures
vdevs loads are only calculated from the set of valid candidates.

The following additional sysctls where added to allow the administrator
to tune the behaviour of the load algorithm:
* vfs.zfs.vdev.mirror.rotating_inc
* vfs.zfs.vdev.mirror.rotating_seek_inc
* vfs.zfs.vdev.mirror.rotating_seek_offset
* vfs.zfs.vdev.mirror.non_rotating_inc
* vfs.zfs.vdev.mirror.non_rotating_seek_inc

These changes where based on work started by the zfsonlinux developers:
openzfs/zfs#1487

Reviewed by:	gibbs, mav, will
MFC after:	2 weeks
Sponsored by:	Multiplay

References:
  https://github.com/freebsd/freebsd@5c7a6f5d
  https://github.com/freebsd/freebsd@31b7f68d
  https://github.com/freebsd/freebsd@e186f564

Performance Testing:
  openzfs/zfs#4334 (comment)

Porting notes:
- The tunables were adjusted to have ZoL-style names.
- The code was modified to use ZoL's vd_nonrot.
- Fixes were done to make cstyle.pl happy
- Merge conflicts were handled manually
- freebsd/freebsd-src@e186f56 by my
  collegue Andriy Gapon has been included. It applied perfectly, but
  added a cstyle regression.
- This replaces 556011d entirely.
- A typo "IO'a" has been corrected to say "IO's"
- Descriptions of new tunables were added to man/man5/zfs-module-parameters.5.

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Changed kstat types, and added kstat defines for OSX.

Ported-by: Jorgen Lundman <lundman@lundman.net>
  • Loading branch information
smh authored and rottegift committed Mar 1, 2016
1 parent 80d2880 commit dec7699
Show file tree
Hide file tree
Showing 7 changed files with 346 additions and 114 deletions.
12 changes: 12 additions & 0 deletions include/sys/kstat_osx.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ typedef struct osx_kstat {
kstat_named_t zfs_send_queue_length;
kstat_named_t zfs_recv_queue_length;

kstat_named_t zfs_vdev_mirror_rotating_inc;
kstat_named_t zfs_vdev_mirror_rotating_seek_inc;
kstat_named_t zfs_vdev_mirror_rotating_seek_offset;
kstat_named_t zfs_vdev_mirror_non_rotating_inc;
kstat_named_t zfs_vdev_mirror_non_rotating_seek_inc;

} osx_kstat_t;


Expand Down Expand Up @@ -190,6 +196,12 @@ extern int zfs_recv_queue_length;
extern uint64_t zfs_l2arc_lowmem_algorithm;
extern uint64_t zfs_l2arc_lowmem_force_permil;

extern uint64_t zfs_vdev_mirror_rotating_inc;
extern uint64_t zfs_vdev_mirror_rotating_seek_inc;
extern uint64_t zfs_vdev_mirror_rotating_seek_offset;
extern uint64_t zfs_vdev_mirror_non_rotating_inc;
extern uint64_t zfs_vdev_mirror_non_rotating_seek_inc;

int kstat_osx_init(void);
void kstat_osx_fini(void);

Expand Down
4 changes: 4 additions & 0 deletions include/sys/vdev.h
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ extern void vdev_queue_fini(vdev_t *vd);
extern zio_t *vdev_queue_io(zio_t *zio);
extern void vdev_queue_io_done(zio_t *zio);

extern int vdev_queue_length(vdev_t *vd);
extern uint64_t vdev_queue_lastoffset(vdev_t *vd);
extern void vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio);

extern void vdev_config_dirty(vdev_t *vd);
extern void vdev_config_clean(vdev_t *vd);
extern int vdev_config_sync(vdev_t **svd, int svdcount, uint64_t txg);
Expand Down
1 change: 1 addition & 0 deletions include/sys/vdev_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ struct vdev_queue {
hrtime_t vq_io_delta_ts;
zio_t vq_io_search; /* used as local for stack reduction */
kmutex_t vq_lock;
uint64_t vq_lastoffset;
};

/*
Expand Down
65 changes: 62 additions & 3 deletions man/man5/zfs-module-parameters.5
Original file line number Diff line number Diff line change
Expand Up @@ -1491,12 +1491,71 @@ Default value: \fB0\fR.
.sp
.ne 2
.na
\fBzfs_vdev_mirror_switch_us\fR (int)
\fBzfs_vdev_mirror_rotating_inc\fR (int)
.ad
.RS 12n
Switch mirrors every N usecs
A number by which the balancing algorithm increments the load calculation for
the purpose of selecting the least busy mirror member when an I/O immediately
follows its predecessor on rotational vdevs for the purpose of making decisions
based on load.
.sp
Default value: \fB10,000\fR.
Default value: \fB0\fR.
.RE

.sp
.ne 2
.na
\fBzfs_vdev_mirror_rotating_seek_inc\fR (int)
.ad
.RS 12n
A number by which the balancing algorithm increments the load calculation for
the purpose of selecting the least busy mirror member when an I/O lacks
locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
this that are not immediately following the previous I/O are incremented by
half.
.sp
Default value: \fB5\fR.
.RE

.sp
.ne 2
.na
\fBzfs_vdev_mirror_rotating_seek_offset\fR (int)
.ad
.RS 12n
The maximum distance for the last queued I/O in which the balancing algorithm
considers an I/O to have locality.
See the section "ZFS I/O SCHEDULER".
.sp
Default value: \fB1048576\fR.
.RE

.sp
.ne 2
.na
\fBzfs_vdev_mirror_non_rotating_inc\fR (int)
.ad
.RS 12n
A number by which the balancing algorithm increments the load calculation for
the purpose of selecting the least busy mirror member on non-rotational vdevs
when I/Os do not immediately follow one another.
.sp
Default value: \fB0\fR.
.RE

.sp
.ne 2
.na
\fBzfs_vdev_mirror_non_rotating_seek_inc\fR (int)
.ad
.RS 12n
A number by which the balancing algorithm increments the load calculation for
the purpose of selecting the least busy mirror member when an I/O lacks
locality as defined by the zfs_vdev_mirror_rotating_seek_offset. I/Os within
this that are not immediately following the previous I/O are incremented by
half.
.sp
Default value: \fB1\fR.
.RE

.sp
Expand Down
Loading

0 comments on commit dec7699

Please sign in to comment.