Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool autoexpand doesn't work #120

Closed
fajarnugraha opened this issue Feb 24, 2011 · 9 comments
Closed

zpool autoexpand doesn't work #120

fajarnugraha opened this issue Feb 24, 2011 · 9 comments
Labels
Component: ZED ZFS Event Daemon Type: Feature Feature request or new feature

Comments

@fajarnugraha
Copy link
Contributor

zpool autoexpand doesn't work. This feature is desirable if you use LVM or SAN that can resize their LUN as zpool vdev.

# dd if=/dev/zero of=/data/vbd/vdev1.img bs=1M count=0 seek=1024
0+0 records in
0+0 records out
0 bytes (0 B) copied, 3.2411e-05 seconds, 0.0 kB/s

# zpool create testpool /data/vbd/vdev1.img

# zpool get autoexpand testpool
NAME      PROPERTY    VALUE   SOURCE
testpool  autoexpand  off     default

# zpool set autoexpand=on testpool

# zpool get autoexpand testpool
NAME      PROPERTY    VALUE   SOURCE
testpool  autoexpand  on      local

# zpool list testpool
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
testpool  1016M  82.5K  1016M     0%  1.00x  ONLINE  -

# ls -lash /data/vbd
total 1.2M
4.0K drwxr-xr-x 2 root  root  4.0K Feb 24 14:01 .
4.0K drwxr-xr-x 6 build build 4.0K Feb 24 14:00 ..
1.2M -rw-r--r-- 1 root  root  1.0G Feb 24 14:02 vdev1.img

# dd if=/dev/zero of=/data/vbd/vdev1.img bs=1M count=0 seek=2048
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.9672e-05 seconds, 0.0 kB/s

# ls -lash /data/vbd
total 1.2M
4.0K drwxr-xr-x 2 root  root  4.0K Feb 24 14:01 .
4.0K drwxr-xr-x 6 build build 4.0K Feb 24 14:00 ..
1.2M -rw-r--r-- 1 root  root  2.0G Feb 24 14:05 vdev1.img

# zpool list testpool
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
testpool  1016M  82.5K  1016M     0%  1.00x  ONLINE  -

# zpool export testpool

# zpool import -d /data/vbd/ testpool

# zpool list testpool
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
testpool  1016M   132K  1016M     0%  1.00x  ONLINE  -

For comparison, autoexpand works in zfs-fuse as long as you export-import the pool.

@fajarnugraha
Copy link
Contributor Author

Relevant discussion about autoexpand on zfs-fuse list: http://groups.google.com/group/zfs-fuse/browse_thread/thread/c64d2ab7088f51eb

@fajarnugraha
Copy link
Contributor Author

Temporary fix: fajarnugraha@db15c42 , based on http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=commitdiff;h=3b2e400c94eb488cff53cf701554c26d5ebe52e4

Note that this works for non-partitioned devices only (e.g. LVM). If you have partition on the vdev, it must be resized manually first (since the partition that zpool uses is still at the old size)

@behlendorf
Copy link
Contributor

Thanks for pointing me to the previous zfs-fuse thread for this. Looking at the issue I think we can do better in the native port. Most of the infrastructure we need I have already put in place.

So the way this is supposed to work in the import case is... spa_import() calls spa_async_request(spa, SPA_ASYNC_AUTOEXPAND) after opening all the vdevs. This flag is then checked asynchronously in the spa_async_thread() and if it's set spa_async_autoexpand() gets called which generates an event for user space to consume assuming spa->spa_autoexpand is enabled. I presume some daemon in user space then runs zpool online -e pool dev to expand the device.

Most of this should happen in the Linux port today. I suspect the first thing that will go wrong is in spa_async_autoexpand() because I don't believe vd->vdev_physpath is being set. It may be we'll need to check, if not that should be fixed. Once that is fixed it will generate an event and I have added infrastructure for event handling in this port. A list of events which have occurred can be had by running zpool events.

This however isn't the only way to consume these events in user space. Part of my long term plan has always been to write a simple user space daemon which consumes these events and takes some action. I've opened issue #2 for exactly this work. It will be the same infrastructure needed to do things like kick in hot spares or say you when a drive fails. If a simple daemon was written it could then consume the autoexpand event and run the correct zpool command to expand the device.

The last bit of autoexpand will be to place a hook on the system which similarly calls_zpool online -e pool dev_ when a device size changes. I suspect that in the case of a device resize udev will be notified of this via sysevent. So we may need to just add a hook here to run the correct command. Alternately, I'm sure we could add a hook on the kernel side to generate another autoexpand zfs event when we notice the device size change.

@Rudd-O
Copy link
Contributor

Rudd-O commented Mar 10, 2011

godamn that patch would have saved me about three hours today.

sigh.

@shrirang
Copy link

Hi fajarnugraha ,
I tried the same steps that you have mentioned above on Solaris machine and it too shows the same behavior as above.

@bill-mcgonigle
Copy link

Note to future travelers: zpool online -e poolname vdev works as expected. I previously thought zpool import -o expand=on poolname was the only way to get the job done.

My use case is an md array with a LUKS/device-mapper under a zvol. Replaced and expanded the whole stack successfully!

@FransUrbo
Copy link
Contributor

@fajarnugraha @behlendorf @Rudd-O @shrirang This is quite old, is it still a problem with latest GIT? If not, could you please close this?

@behlendorf
Copy link
Contributor

This is still an open issue. We must add a script for the ZED to consume 'autoexpand' events.

@hvisage
Copy link

hvisage commented Aug 1, 2017

okay, Debian9, 0.6.5.9, and I don't get this autoexpand going

ii  libzpool2linux                0.6.5.9-5                      amd64        OpenZFS pool library for Linux
ii  zfs-dkms                      0.6.5.9-5                      all          OpenZFS filesystem kernel modules for Linux
ii  zfs-zed                       0.6.5.9-5                      amd64        OpenZFS Event Daemon
ii  zfsutils-linux                0.6.5.9-5                      amd64        command-line tools to manage OpenZFS filesystems

root@tracsftpqa01:~# zpool export test
root@tracsftpqa01:~# zpool import -o expand=on test
root@tracsftpqa01:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
test  15.9G   234K  15.9G         -     0%     0%  1.00x  ONLINE  -
root@tracsftpqa01:~# fdisk -l /dev/disk/*/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1
Disk /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1: 145 GiB, 155692564480 bytes, 304087040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 896E82DA-56D8-FD4C-8B30-858036CE594F

Device                                                         Start       End   Sectors  Size Type
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part1      2048 304068607 304066560  145G Solaris
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part9 304068608 304084991     16384    8M Solaris

root@tracsftpqa01:~# zpool get autoexpand test
NAME  PROPERTY    VALUE   SOURCE
test  autoexpand  on      local

eturkes referenced this issue in hn/debian-stretch-zfs-root Aug 28, 2017
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 14, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 15, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 20, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few customizations
for Linux the autoexpand property for whole disk configurations
can be supported.

Autoexpand works as follows; when a block device is resized a
change event is generated by udev with the DISK_MEDIA_CHANGE key.
The ZED, which is monitoring udev events detects the event for
disks (but not partitions) and hands it off to zfs_deliver_dle().
The zfs_deliver_dle() function appends the exected whole disk
partition suffix, and if the partition can be matched against
a known pool vdev it re-opens it.

Re-opening the vdev with trigger a re-reading of the partition
table so the maximum possible expansion size can be reported.
Next if the property autoexpand is set to "on" a vdev expansion
will be attempted.  After performing some sanity checks on the
disk to verify it's safe to expand the ZFS partition (-part1) it
will be expanded an the partition table updated.  The partition
is then re-opened again to detect the updated size which allows
the new capacity to be used.

Added PHYS_PATH="/dev/zvol/dataset" to vdev configuration for
ZFS volumes.  This was required for the test cases which test
expansion by layering a new pool on top of ZFS volumes.

Enable the zpool_expand_001_pos and /zpool_expand_003_pos
test cases which excercise the autoexpand property.

Fixed zfs_zevent_wait() signal handling which could result
in the ZED spinning when a signal was not handled.

Removed vdev_disk_rrpart() functionality which can be abandoned
in favour of re-opening the device which trigger a re-read of
the partition table as long no other partitions are in use.
This will be true as long as we're working with hole disks.
As a bonus this allows us to remove to Linux kernel API checks.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 28, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 28, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 29, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 9, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 12, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 13, 2018
While the autoexpand property may seem like a small feature it
depends on a significant amount of system infrastructure.  Enough
of that infrastructure is now in place with a few modifications
for Linux it can be supported.

Auto-expand works as follows; when a block device is modified
(re-sized, closed after being open r/w, etc) a change uevent is
generated for udev.  The ZED, which is monitoring udev events,
passes the change event along to zfs_deliver_dle() if the disk
or partition contains a zfs_member as identified by blkid.

From here the device is matched against all imported pool vdevs
using the vdev_guid which was read from the label by blkid.  If
a match is found the ZED reopens the pool vdev.  This re-opening
is important because it allows the vdev to be briefly closed so
the disk partition table can be re-read.  Otherwise, it wouldn't
be possible to report thee maximum possible expansion size.

Finally, if the property autoexpand=on a vdev expansion will be
attempted.  After performing some sanity checks on the disk to
verify that it is safe to expand,  the primary partition (-part1)
will be expanded and the partition table updated.  The partition
is then re-opened (again) to detect the updated size which allows
the new capacity to be used.

In order to make all of the above possible the following changes
were required:

* Updated the zpool_expand_001_pos and zpool_expand_003_pos tests.
  These tests now create a pool which is layered on a loopback,
  scsi_debug, and file vdev.  This allows for testing of non-
  partitioned block device (loopback), a partition block device
  (scsi_debug), and a file which does not receive udev change
  events.  This provided for better test coverage, and by removing
  the layering on ZFS volumes there issues surrounding layering
  one pool on another are avoided.

* zpool_find_vdev_by_physpath() updated to accept a vdev guid.
  This allows for matching by guid rather than path which is a
  more reliable way for the ZED to reference a vdev.

* Fixed zfs_zevent_wait() signal handling which could result
  in the ZED spinning when a signal was not handled.

* Removed vdev_disk_rrpart() functionality which can be abandoned
  in favor of kernel provided blkdev_reread_part() function.

* Added a rwlock which is held as a writer while a disk is being
  reopened.  This is important to prevent errors from occurring
  for any configuration related IOs which bypass the SCL_ZIO lock.
  The zpool_reopen_007_pos.ksh test case was added to verify IO
  error are never observed when reopening.  This is not expected
  to impact IO performance.

Additional fixes which aren't critical but were discovered and
resolved in the course of developing this functionality.

* Added PHYS_PATH="/dev/zvol/dataset" to the vdev configuration for
  ZFS volumes.  This is as good as a unique physical path, while the
  volumes are not used in the test cases anymore for other reasons
  this improvement was included.

Signed-off-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#120
Issue openzfs#2437
Issue openzfs#5771
Issue openzfs#7366
Issue openzfs#7582
richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018
Bump up in replication version number

Signed-off-by: Vishnu Itta <vitta@mayadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZED ZFS Event Daemon Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

7 participants