Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge commit gap #4

Merged
merged 91 commits into from
Aug 1, 2016
Merged

merge commit gap #4

merged 91 commits into from
Aug 1, 2016

Commits on May 23, 2016

  1. Consistently use parsable instead of parseable

    This is a purely cosmetical change, to consistently prefer one of
    two (both acceptable) choises for the word parsable in documentation and
    code. I don't really care which to use, but acording to wiktionary
    https://en.wiktionary.org/wiki/parsable#English parsable is preferred.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4682
    chrekh authored and behlendorf committed May 23, 2016
    Configuration menu
    Copy the full SHA
    3491d6e View commit details
    Browse the repository at this point in the history
  2. Add missing RPM BuildRequires

    Both libudev and libattr are recommended build requirements.  As
    such their development headers should lists in the rpm spec file
    so those dependencies are pulled in when building rpm packages.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4676
    behlendorf committed May 23, 2016
    Configuration menu
    Copy the full SHA
    de0ef91 View commit details
    Browse the repository at this point in the history
  3. Skip ctldir znode in zfs_rezget to fix snapdir issues

    Skip ctldir in zfs_rezget, otherwise they will always get invalidated. This
    will cause funny behaviour for the mounted snapdirs. Especially for
    Linux >= 3.18, d_invalidate will detach the mountpoint and prevent anyone
    automount it again as long as someone is still using the detached mount.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4514
    Closes #4661
    Closes #4672
    Chunwei Chen authored and behlendorf committed May 23, 2016
    Configuration menu
    Copy the full SHA
    cbecb4f View commit details
    Browse the repository at this point in the history
  4. Improve zfs-module-parameters(5)

    Various rewrites to the descriptions of module parameters. Corrects
    spelling mistakes, makes descriptions them more user-friendly and
    describes some ZFS quirks which should be understood before changing
    parameter values.
    
    Signed-off-by: DHE <git@dehacked.net>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4671
    DeHackEd authored and behlendorf committed May 23, 2016
    Configuration menu
    Copy the full SHA
    8342673 View commit details
    Browse the repository at this point in the history

Commits on May 25, 2016

  1. Fix arc_prune_task use-after-free

    arc_prune_task uses a refcount to protect arc_prune_t, but it doesn't prevent
    the underlying zsb from disappearing if there's a concurrent umount. We fix
    this by force the caller of arc_remove_prune_callback to wait for
    arc_prune_taskq to finish.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4687
    Closes #4690
    Chunwei Chen authored and behlendorf committed May 25, 2016
    Configuration menu
    Copy the full SHA
    4442f60 View commit details
    Browse the repository at this point in the history
  2. Add request size histograms (-r) to zpool iostat, minor man page fix

    Add -r option to "zpool iostat" to print request size histograms for the leaf
    ZIOs. This includes histograms of individual ZIOs ("ind") and aggregate ZIOs
    ("agg"). These stats can be useful for seeing how well the ZFS IO aggregator
    is working.
    
    $ zpool iostat -r
    mypool        sync_read    sync_write    async_read    async_write      scrub
    req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
    ----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
    512             0      0      0      0      0      0    530      0      0      0
    1K              0      0    260      0      0      0    116    246      0      0
    2K              0      0      0      0      0      0      0    431      0      0
    4K              0      0      0      0      0      0      3    107      0      0
    8K             15      0     35      0      0      0      0      6      0      0
    16K             0      0      0      0      0      0      0     39      0      0
    32K             0      0      0      0      0      0      0      0      0      0
    64K            20      0     40      0      0      0      0      0      0      0
    128K            0      0     20      0      0      0      0      0      0      0
    256K            0      0      0      0      0      0      0      0      0      0
    512K            0      0      0      0      0      0      0      0      0      0
    1M              0      0      0      0      0      0      0      0      0      0
    2M              0      0      0      0      0      0      0      0      0      0
    4M              0      0      0      0      0      0    155     19      0      0
    8M              0      0      0      0      0      0      0    811      0      0
    16M             0      0      0      0      0      0      0     68      0      0
    --------------------------------------------------------------------------------
    
    Also rename the stray "-G" in the man page to be "-w" for latency histograms.
    
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Closes #4659
    tonyhutter authored and behlendorf committed May 25, 2016
    Configuration menu
    Copy the full SHA
    7e94507 View commit details
    Browse the repository at this point in the history

Commits on May 26, 2016

  1. OpenZFS 6531 - Provide mechanism to artificially limit disk performance

    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Ported by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6531
    OpenZFS-commit: openzfs/openzfs@97e8130
    
    Porting notes:
    - Added new IO delay tracepoints, and moved common ZIO tracepoint macros
      to a new trace_common.h file.
    - Used zio_delay_taskq() in place of OpenZFS's timeout_generic() function.
    - Updated zinject man page
    - Updated zpool_scrub test files
    tonyhutter authored and behlendorf committed May 26, 2016
    Configuration menu
    Copy the full SHA
    26ef0cc View commit details
    Browse the repository at this point in the history

Commits on May 27, 2016

  1. Systemd configuration fixes

    * Disable zfs-import-scan.service by default.  This ensures that
    pools will not be automatically imported unless they appear in
    the cache file.  When this service is explicitly enabled pools
    will be imported with the "cachefile=none" property set.  This
    prevents the creation of, or update to, an existing cache file.
    
        $ systemctl list-unit-files | grep zfs
        zfs-import-cache.service                  enabled
        zfs-import-scan.service                   disabled
        zfs-mount.service                         enabled
        zfs-share.service                         enabled
        zfs-zed.service                           enabled
        zfs.target                                enabled
    
    * Change services to dynamic from static by adding an [Install]
    section and adding 'WantedBy' tags in favor of 'Requires' tags.
    This allows for easier customization of the boot behavior.
    
    * Start the zfs-import-cache.service after the root pivot so
    the cache file is available in the standard location.
    
    * Start the zfs-mount.service after the systemd-remount-fs.service
    to ensure the root fs is writeable and the ZFS filesystems can
    create their mount points.
    
    * Change the default behavior to only load the ZFS kernel modules
    in zfs-import-*.service or when blkid(8) detects a pool.  Users
    who wish to unconditionally load the kernel modules must uncomment
    the list of modules in /lib/modules-load.d/zfs.conf.
    
    Reviewed-by: Richard Laager <rlaager@wiktel.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4325
    Closes #4496
    Closes #4658
    Closes #4699
    behlendorf committed May 27, 2016
    Configuration menu
    Copy the full SHA
    92547bc View commit details
    Browse the repository at this point in the history
  2. Fix self-healing IO prior to dsl_pool_init() completion

    Async writes triggered by a self-healing IO may be issued before the
    pool finishes the process of initialization.  This results in a NULL
    dereference of `spa->spa_dsl_pool` in vdev_queue_max_async_writes().
    
    George Wilson recommended addressing this issue by initializing the
    passed `dsl_pool_t **` prior to dmu_objset_open_impl().  Since the
    caller is passing the `spa->spa_dsl_pool` this has the effect of
    ensuring it's initialized.
    
    However, since this depends on the caller knowing they must pass
    the `spa->spa_dsl_pool` an additional NULL check was added to
    vdev_queue_max_async_writes().  This guards against any future
    restructuring of the code which might result in dsl_pool_init()
    being called differently.
    
    Signed-off-by: GeLiXin <47034221@qq.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4652
    GeLiXin authored and behlendorf committed May 27, 2016
    Configuration menu
    Copy the full SHA
    b7faa7a View commit details
    Browse the repository at this point in the history

Commits on May 31, 2016

  1. Add isa_defs for MIPS

    GCC for MIPS only defines _LP64 when 64bit,
    while no _ILP32 defined when 32bit.
    
    Signed-off-by: YunQiang Su <syq@debian.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4712
    wzssyqa authored and behlendorf committed May 31, 2016
    Configuration menu
    Copy the full SHA
    2493dca View commit details
    Browse the repository at this point in the history
  2. Fix out-of-bound access in zfs_fillpage

    The original code will do an out-of-bound access on pl[] during last
    iteration.
    
     ==================================================================
     BUG: KASAN: stack-out-of-bounds in zfs_getpage+0x14c/0x2d0 [zfs]
     Read of size 8 by task tmpfile/7850
     page:ffffea00017c6dc0 count:0 mapcount:0 mapping:          (null) index:0x0
     flags: 0xffff8000000000()
     page dumped because: kasan: bad access detected
     CPU: 3 PID: 7850 Comm: tmpfile Tainted: G           OE   4.6.0+ #3
      ffff88005f1b7678 0000000006dbe035 ffff88005f1b7508 ffffffff81635618
      ffff88005f1b7678 ffff88005f1b75a0 ffff88005f1b7590 ffffffff81313ee8
      ffffea0001ae8dd0 ffff88005f1b7670 0000000000000246 0000000041b58ab3
     Call Trace:
      [<ffffffff81635618>] dump_stack+0x63/0x8b
      [<ffffffff81313ee8>] kasan_report_error+0x528/0x560
      [<ffffffff81278f20>] ? filemap_map_pages+0x5f0/0x5f0
      [<ffffffff813144b8>] kasan_report+0x58/0x60
      [<ffffffffc12250dc>] ? zfs_getpage+0x14c/0x2d0 [zfs]
      [<ffffffff81312e4e>] __asan_load8+0x5e/0x70
      [<ffffffffc12250dc>] zfs_getpage+0x14c/0x2d0 [zfs]
      [<ffffffffc1252131>] zpl_readpage+0xd1/0x180 [zfs]
    
      [<ffffffff81353c3a>] SyS_execve+0x3a/0x50
      [<ffffffff810058ef>] do_syscall_64+0xef/0x180
      [<ffffffff81d0ee25>] entry_SYSCALL64_slow_path+0x25/0x25
     Memory state around the buggy address:
      ffff88005f1b7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ffff88005f1b7580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     >ffff88005f1b7600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4
                                                                     ^
      ffff88005f1b7680: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
      ffff88005f1b7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     ==================================================================
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4705
    Issue #4708
    Chunwei Chen authored and behlendorf committed May 31, 2016
    Configuration menu
    Copy the full SHA
    540c392 View commit details
    Browse the repository at this point in the history
  3. Fix memleak in zpl_parse_options

    strsep() will advance tmp_mntopts, and will change it to NULL on last
    iteration.  This will cause strfree(tmp_mntopts) to not free anything.
    
    unreferenced object 0xffff8800883976c0 (size 64):
      comm "mount.zfs", pid 3361, jiffies 4294931877 (age 1482.408s)
      hex dump (first 32 bytes):
        72 77 00 73 74 72 69 63 74 61 74 69 6d 65 00 7a  rw.strictatime.z
        66 73 75 74 69 6c 00 6d 6e 74 70 6f 69 6e 74 3d  fsutil.mntpoint=
      backtrace:
        [<ffffffff81810c4e>] kmemleak_alloc+0x4e/0xb0
        [<ffffffff811f9cac>] __kmalloc+0x16c/0x250
        [<ffffffffc065ce9b>] strdup+0x3b/0x60 [spl]
        [<ffffffffc080fad6>] zpl_parse_options+0x56/0x300 [zfs]
        [<ffffffffc080fe46>] zpl_mount+0x36/0x80 [zfs]
        [<ffffffff81222dc8>] mount_fs+0x38/0x160
        [<ffffffff81240097>] vfs_kern_mount+0x67/0x110
        [<ffffffff812428e0>] do_mount+0x250/0xe20
        [<ffffffff812437d5>] SyS_mount+0x95/0xe0
        [<ffffffff8181aff6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
        [<ffffffffffffffff>] 0xffffffffffffffff
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4706
    Issue #4708
    Chunwei Chen authored and behlendorf committed May 31, 2016
    Configuration menu
    Copy the full SHA
    06ee003 View commit details
    Browse the repository at this point in the history
  4. Fix memleak in vdev_config_generate_stats

    fnvlist_add_nvlist will copy the contents of nvx, so we need to
    free it here.
    
    unreferenced object 0xffff8800a6934e80 (size 64):
      comm "zpool", pid 3398, jiffies 4295007406 (age 214.180s)
      hex dump (first 32 bytes):
        60 06 c2 73 00 88 ff ff 00 7c 8c 73 00 88 ff ff  `..s.....|.s....
        00 00 00 00 00 00 00 00 40 b0 70 c0 ff ff ff ff  ........@.p.....
      backtrace:
        [<ffffffff81810c4e>] kmemleak_alloc+0x4e/0xb0
        [<ffffffff811fac7d>] __kmalloc_node+0x17d/0x310
        [<ffffffffc065528c>] spl_kmem_alloc_impl+0xac/0x180 [spl]
        [<ffffffffc0657379>] spl_vmem_alloc+0x19/0x20 [spl]
        [<ffffffffc07056cf>] nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
        [<ffffffffc07006b7>] nvlist_xalloc.part.13+0x27/0xc0 [znvpair]
        [<ffffffffc07007ad>] nvlist_alloc+0x3d/0x40 [znvpair]
        [<ffffffffc0703abc>] fnvlist_alloc+0x2c/0x80 [znvpair]
        [<ffffffffc07b1783>] vdev_config_generate_stats+0x83/0x370 [zfs]
        [<ffffffffc07b1f53>] vdev_config_generate+0x4e3/0x650 [zfs]
        [<ffffffffc07996db>] spa_config_generate+0x20b/0x4b0 [zfs]
        [<ffffffffc0794f64>] spa_tryimport+0xc4/0x430 [zfs]
        [<ffffffffc07d11d8>] zfs_ioc_pool_tryimport+0x68/0x110 [zfs]
        [<ffffffffc07d4fc6>] zfsdev_ioctl+0x646/0x7a0 [zfs]
        [<ffffffff81232e31>] do_vfs_ioctl+0xa1/0x5b0
        [<ffffffff812333b9>] SyS_ioctl+0x79/0x90
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4707
    Issue #4708
    Chunwei Chen authored and behlendorf committed May 31, 2016
    Configuration menu
    Copy the full SHA
    6a79672 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2016

  1. Linux 4.7 compat: handler->set() takes both dentry and inode

    Counterpart to fd4c7b7, the same approach was taken to resolve
    the compatibility issue.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Closes #4717 
    Issue #4665
    behlendorf committed Jun 2, 2016
    Configuration menu
    Copy the full SHA
    8fbbc6b View commit details
    Browse the repository at this point in the history
  2. Implementation of AVX2 optimized Fletcher-4

    New functionality:
    - Preserves existing scalar implementation.
    - Adds AVX2 optimized Fletcher-4 computation.
    - Fastest routines selected on module load (benchmark).
    - Test case for Fletcher-4 added to ztest.
    
    New zcommon module parameters:
    -  zfs_fletcher_4_impl (str): selects the implementation to use.
        "fastest" - use the fastest version available
        "cycle"   - cycle trough all available impl for ztest
        "scalar"  - use the original version
        "avx2"    - new AVX2 implementation if available
    
    Performance comparison (Intel i7 CPU, 1MB data buffers):
    - Scalar:  4216 MB/s
    - AVX2:   14499 MB/s
    
    See contents of `/sys/module/zcommon/parameters/zfs_fletcher_4_impl`
    to get list of supported values. If an implementation is not supported
    on the system, it will not be shown. Currently selected option is
    enclosed in `[]`.
    
    Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
    Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4330
    Jinshan Xiong authored and behlendorf committed Jun 2, 2016
    Configuration menu
    Copy the full SHA
    1eeb456 View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2016

  1. Fix cstyle.pl warnings

    As of perl v5.22.1 the following warnings are generated:
    
    * Redundant argument in printf at scripts/cstyle.pl line 194
    
    * Unescaped left brace in regex is deprecated, passed through
      in regex; marked by <-- HERE in m/\S{ <-- HERE / at
      scripts/cstyle.pl line 608.
    
    They have been addressed by escaping the left braces and by
    providing the correct number of arguments to printf based on
    the fmt specifier set by the verbose option.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4723
    behlendorf committed Jun 3, 2016
    Configuration menu
    Copy the full SHA
    f866a4e View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2016

  1. Fix minor spelling mistakes

    Trivial spelling mistake fix in error message text.
    
    * Fix spelling mistake "adminstrator" -> "administrator"
    * Fix spelling mistake "specificed" -> "specified"
    * Fix spelling mistake "interperted" -> "interpreted"
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4728
    Colin Ian King authored and behlendorf committed Jun 6, 2016
    Configuration menu
    Copy the full SHA
    2627e75 View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2016

  1. Add zfs allow and zfs unallow support

    ZFS allows for specific permissions to be delegated to normal users
    with the `zfs allow` and `zfs unallow` commands.  In addition, non-
    privileged users should be able to run all of the following commands:
    
      * zpool [list | iostat | status | get]
      * zfs [list | get]
    
    Historically this functionality was not available on Linux.  In order
    to add it the secpolicy_* functions needed to be implemented and mapped
    to the equivalent Linux capability.  Only then could the permissions on
    the `/dev/zfs` be relaxed and the internal ZFS permission checks used.
    
    Even with this change some limitations remain.  Under Linux only the
    root user is allowed to modify the namespace (unless it's a private
    namespace).  This means the mount, mountpoint, canmount, unmount,
    and remount delegations cannot be supported with the existing code.  It
    may be possible to add this functionality in the future.
    
    This functionality was validated with the cli_user and delegation test
    cases from the ZFS Test Suite.  These tests exhaustively verify each
    of the supported permissions which can be delegated and ensures only
    an authorized user can perform it.
    
    Two minor bug fixes were required for test-running.py.  First, the
    Timer() object cannot be safely created in a `try:` block when there
    is an unconditional `finally` block which references it.  Second,
    when running as a normal user also check for scripts using the
    both the .ksh and .sh suffixes.
    
    Finally, existing users who are simulating delegations by setting
    group permissions on the /dev/zfs device should revert that
    customization when updating to a version with this change.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Closes #362 
    Closes #434 
    Closes #4100
    Closes #4394 
    Closes #4410 
    Closes #4487
    behlendorf committed Jun 7, 2016
    Configuration menu
    Copy the full SHA
    f74b821 View commit details
    Browse the repository at this point in the history

Commits on Jun 16, 2016

  1. Remove libzfs_graph.c

    The libzfs_graph.c source file should have been removed in 330d06f,
    it is entirely unused.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4766
    behlendorf committed Jun 16, 2016
    Configuration menu
    Copy the full SHA
    46ab359 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2016

  1. Linux 4.6 compat: Fall back to d_prune_aliases() if necessary

    As of 4.6, the icache and dcache LRUs are memcg aware insofar as the
    kernel's per-superblock shrinker is concerned.  The effect is that dcache
    or icache entries added by a task in a non-root memcg won't be scanned
    by the shrinker in the context of the root (or NULL) memcg.  This defeats
    the attempts by zfs_sb_prune() to unpin buffers and can allow metadata to
    grow uncontrollably.  This patch reverts to the d_prune_aliaes() method
    in case the kernel's per-superblock shrinker is not able to free anything.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
    Closes: #4726
    dweeezil authored and behlendorf committed Jun 17, 2016
    Configuration menu
    Copy the full SHA
    09fb30e View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2016

  1. SIMD implementation of vdev_raidz generate and reconstruct routines

    This is a new implementation of RAIDZ1/2/3 routines using x86_64
    scalar, SSE, and AVX2 instruction sets. Included are 3 parity
    generation routines (P, PQ, and PQR) and 7 reconstruction routines,
    for all RAIDZ level. On module load, a quick benchmark of supported
    routines will select the fastest for each operation and they will
    be used at runtime. Original implementation is still present and
    can be selected via module parameter.
    
    Patch contains:
    - specialized gen/rec routines for all RAIDZ levels,
    - new scalar raidz implementation (unrolled),
    - two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
    - fastest routines selected on module load (benchmark).
    - cmd/raidz_test - verify and benchmark all implementations
    - added raidz_test to the ZFS Test Suite
    
    New zfs module parameters:
    - zfs_vdev_raidz_impl (str): selects the implementation to use. On
      module load, the parameter will only accept first 3 options, and
      the other implementations can be set once module is finished
      loading. Possible values for this option are:
        "fastest" - use the fastest math available
        "original" - use the original raidz code
        "scalar" - new scalar impl
        "sse" - new SSE impl if available
        "avx2" - new AVX2 impl if available
    
    See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
    get the list of supported values. If an implementation is not supported
    on the system, it will not be shown. Currently selected option is
    enclosed in `[]`.
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4328
    ironMann authored and behlendorf committed Jun 21, 2016
    Configuration menu
    Copy the full SHA
    ab9f4b0 View commit details
    Browse the repository at this point in the history
  2. Fix NFS credential

    The commit f74b821 caused a regression where creating file through NFS will
    always create a file owned by root. This is because the patch enables the KSID
    code in zfs_acl_ids_create, which it would use euid and egid of the current
    process. However, on Linux, we should use fsuid and fsgid for file operations,
    which is the original behaviour. So we revert this part of code.
    
    The patch also enables secpolicy_vnode_*, since they are also used in file
    operations, we change them to use fsuid and fsgid.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4772
    Closes #4758
    Chunwei Chen authored and behlendorf committed Jun 21, 2016
    Configuration menu
    Copy the full SHA
    100a91a View commit details
    Browse the repository at this point in the history
  3. OpenZFS 6513 - partially filled holes lose birth time

    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
    Approved by: Richard Lowe <richlowe@richlowe.net>a
    Ported by: Boris Protopopov <bprotopopov@actifio.com>
    Signed-off-by: Boris Protopopov <bprotopopov@actifio.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6513
    OpenZFS-commit: openzfs/openzfs@8df0bcf0
    
    If a ZFS object contains a hole at level one, and then a data block is
    created at level 0 underneath that l1 block, l0 holes will be created.
    However, these l0 holes do not have the birth time property set; as a
    result, incremental sends will not send those holes.
    
    Fix is to modify the dbuf_read code to fill in birth time data.
    pcd1193182 authored and behlendorf committed Jun 21, 2016
    Configuration menu
    Copy the full SHA
    bc77ba7 View commit details
    Browse the repository at this point in the history
  4. Add a test case for dmu_free_long_range() to ztest

    Signed-off-by: Boris Protopopov <bprotopopov@actifio.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4754
    bprotopopov authored and behlendorf committed Jun 21, 2016
    Configuration menu
    Copy the full SHA
    d0de2e8 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2016

  1. Revert "Add a test case for dmu_free_long_range() to ztest"

    This reverts commit d0de2e8 which
    introduced a new test case to ztest which is failing occasionally
    during automated testing.  The change is being reverted until
    the issue can be fully investigated.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4754
    behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    391bba1 View commit details
    Browse the repository at this point in the history
  2. OpenZFS 6878 - Add scrub completion info to "zpool history"

    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Authored by: Nav Ravindranath <nav@delphix.com>
    Ported-by: Chris Dunlop <chris@onthe.net.au>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6878
    OpenZFS-commit: openzfs/openzfs@1825bc5
    Closes #4787
    Nav Ravindranath authored and behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    784d15c View commit details
    Browse the repository at this point in the history
  3. FreeBSD rS271776 - Persist vdev_resilver_txg changes

    Persist vdev_resilver_txg changes to avoid panic caused by validation
    vs a vdev_resilver_txg value from a previous resilver.
    
    Authored-by: smh <smh@FreeBSD.org>
    Ported-by: Chris Dunlop <chris@onthe.net.au>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/5154
    FreeBSD-issue: https://reviews.freebsd.org/rS271776
    FreeBSD-commit: freebsd/freebsd-src@c3c60bf
    Closes #4790
    smh authored and behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    d14fa5d View commit details
    Browse the repository at this point in the history
  4. xattrtest: allow verify with -R and other improvements

    - Use a fixed buffer of random bytes when random xattr values are in
      effect.  This eliminates the potential performance bottleneck of
      reading from /dev/urandom for each file. This also allows us to
      verify xattrs in random value mode.
    
    - Show the rate of operations per second in addition to elapsed time
      for each phase of the test. This may be useful for benchmarking.
    
    - Set default xattr size to 6 so that verify doesn't fail if user
      doesn't specify a size. We need at least six bytes to store the
      leading "size=X" string that is used for verification.
    
    - Allow user to execute just one phase of the test. Acceptable
      values for -o and their meanings are:
    
       1 - run the create phase
       2 - run the setxattr phase
       3 - run the getxattr phase
       4 - run the unlink phase
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    nedbass authored and behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    8128558 View commit details
    Browse the repository at this point in the history
  5. Backfill metadnode more intelligently

    Only attempt to backfill lower metadnode object numbers if at least
    4096 objects have been freed since the last rescan, and at most once
    per transaction group. This avoids a pathology in dmu_object_alloc()
    that caused O(N^2) behavior for create-heavy workloads and
    substantially improves object creation rates.  As summarized by
    @mahrens in #4636:
    
    "Normally, the object allocator simply checks to see if the next
    object is available. The slow calls happened when dmu_object_alloc()
    checks to see if it can backfill lower object numbers. This happens
    every time we move on to a new L1 indirect block (i.e. every 32 *
    128 = 4096 objects).  When re-checking lower object numbers, we use
    the on-disk fill count (blkptr_t:blk_fill) to quickly skip over
    indirect blocks that don’t have enough free dnodes (defined as an L2
    with at least 393,216 of 524,288 dnodes free). Therefore, we may
    find that a block of dnodes has a low (or zero) fill count, and yet
    we can’t allocate any of its dnodes, because they've been allocated
    in memory but not yet written to disk. In this case we have to hold
    each of the dnodes and then notice that it has been allocated in
    memory.
    
    The end result is that allocating N objects in the same TXG can
    require CPU usage proportional to N^2."
    
    Add a tunable dmu_rescan_dnode_threshold to define the number of
    objects that must be freed before a rescan is performed. Don't bother
    to export this as a module option because testing doesn't show a
    compelling reason to change it. The vast majority of the performance
    gain comes from limit the rescan to at most once per TXG.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    nedbass authored and behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    68cbd56 View commit details
    Browse the repository at this point in the history
  6. Implement large_dnode pool feature

    Justification
    -------------
    
    This feature adds support for variable length dnodes. Our motivation is
    to eliminate the overhead associated with using spill blocks.  Spill
    blocks are used to store system attribute data (i.e. file metadata) that
    does not fit in the dnode's bonus buffer. By allowing a larger bonus
    buffer area the use of a spill block can be avoided.  Spill blocks
    potentially incur an additional read I/O for every dnode in a dnode
    block. As a worst case example, reading 32 dnodes from a 16k dnode block
    and all of the spill blocks could issue 33 separate reads. Now suppose
    those dnodes have size 1024 and therefore don't need spill blocks.  Then
    the worst case number of blocks read is reduced to from 33 to two--one
    per dnode block. In practice spill blocks may tend to be co-located on
    disk with the dnode blocks so the reduction in I/O would not be this
    drastic. In a badly fragmented pool, however, the improvement could be
    significant.
    
    ZFS-on-Linux systems that make heavy use of extended attributes would
    benefit from this feature. In particular, ZFS-on-Linux supports the
    xattr=sa dataset property which allows file extended attribute data
    to be stored in the dnode bonus buffer as an alternative to the
    traditional directory-based format. Workloads such as SELinux and the
    Lustre distributed filesystem often store enough xattr data to force
    spill bocks when xattr=sa is in effect. Large dnodes may therefore
    provide a performance benefit to such systems.
    
    Other use cases that may benefit from this feature include files with
    large ACLs and symbolic links with long target names. Furthermore,
    this feature may be desirable on other platforms in case future
    applications or features are developed that could make use of a
    larger bonus buffer area.
    
    Implementation
    --------------
    
    The size of a dnode may be a multiple of 512 bytes up to the size of
    a dnode block (currently 16384 bytes). A dn_extra_slots field was
    added to the current on-disk dnode_phys_t structure to describe the
    size of the physical dnode on disk. The 8 bits for this field were
    taken from the zero filled dn_pad2 field. The field represents how
    many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
    This convention results in a value of 0 for 512 byte dnodes which
    preserves on-disk format compatibility with older software.
    
    Similarly, the in-memory dnode_t structure has a new dn_num_slots field
    to represent the total number of dnode_phys_t slots consumed on disk.
    Thus dn->dn_num_slots is 1 greater than the corresponding
    dnp->dn_extra_slots. This difference in convention was adopted
    because, unlike on-disk structures, backward compatibility is not a
    concern for in-memory objects, so we used a more natural way to
    represent size for a dnode_t.
    
    The default size for newly created dnodes is determined by the value of
    a new "dnodesize" dataset property. By default the property is set to
    "legacy" which is compatible with older software. Setting the property
    to "auto" will allow the filesystem to choose the most suitable dnode
    size. Currently this just sets the default dnode size to 1k, but future
    code improvements could dynamically choose a size based on observed
    workload patterns. Dnodes of varying sizes can coexist within the same
    dataset and even within the same dnode block. For example, to enable
    automatically-sized dnodes, run
    
     # zfs set dnodesize=auto tank/fish
    
    The user can also specify literal values for the dnodesize property.
    These are currently limited to powers of two from 1k to 16k. The
    power-of-2 limitation is only for simplicity of the user interface.
    Internally the implementation can handle any multiple of 512 up to 16k,
    and consumers of the DMU API can specify any legal dnode value.
    
    The size of a new dnode is determined at object allocation time and
    stored as a new field in the znode in-memory structure. New DMU
    interfaces are added to allow the consumer to specify the dnode size
    that a newly allocated object should use. Existing interfaces are
    unchanged to avoid having to update every call site and to preserve
    compatibility with external consumers such as Lustre. The new
    interfaces names are given below. The versions of these functions that
    don't take a dnodesize parameter now just call the _dnsize() versions
    with a dnodesize of 0, which means use the legacy dnode size.
    
    New DMU interfaces:
      dmu_object_alloc_dnsize()
      dmu_object_claim_dnsize()
      dmu_object_reclaim_dnsize()
    
    New ZAP interfaces:
      zap_create_dnsize()
      zap_create_norm_dnsize()
      zap_create_flags_dnsize()
      zap_create_claim_norm_dnsize()
      zap_create_link_dnsize()
    
    The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
    spa_maxdnodesize() function should be used to determine the maximum
    bonus length for a pool.
    
    These are a few noteworthy changes to key functions:
    
    * The prototype for dnode_hold_impl() now takes a "slots" parameter.
      When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
      ensure the hole at the specified object offset is large enough to
      hold the dnode being created. The slots parameter is also used
      to ensure a dnode does not span multiple dnode blocks. In both of
      these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
      these failure cases are only possible when using DNODE_MUST_BE_FREE.
    
      If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
      dnode_hold_impl() will check if the requested dnode is already
      consumed as an extra dnode slot by an large dnode, in which case
      it returns ENOENT.
    
    * The function dmu_object_alloc() advances to the next dnode block
      if dnode_hold_impl() returns an error for a requested object.
      This is because the beginning of the next dnode block is the only
      location it can safely assume to either be a hole or a valid
      starting point for a dnode.
    
    * dnode_next_offset_level() and other functions that iterate
      through dnode blocks may no longer use a simple array indexing
      scheme. These now use the current dnode's dn_num_slots field to
      advance to the next dnode in the block. This is to ensure we
      properly skip the current dnode's bonus area and don't interpret it
      as a valid dnode.
    
    zdb
    ---
    The zdb command was updated to display a dnode's size under the
    "dnsize" column when the object is dumped.
    
    For ZIL create log records, zdb will now display the slot count for
    the object.
    
    ztest
    -----
    Ztest chooses a random dnodesize for every newly created object. The
    random distribution is more heavily weighted toward small dnodes to
    better simulate real-world datasets.
    
    Unused bonus buffer space is filled with non-zero values computed from
    the object number, dataset id, offset, and generation number.  This
    helps ensure that the dnode traversal code properly skips the interior
    regions of large dnodes, and that these interior regions are not
    overwritten by data belonging to other dnodes. A new test visits each
    object in a dataset. It verifies that the actual dnode size matches what
    was stored in the ztest block tag when it was created. It also verifies
    that the unused bonus buffer space is filled with the expected data
    patterns.
    
    ZFS Test Suite
    --------------
    Added six new large dnode-specific tests, and integrated the dnodesize
    property into existing tests for zfs allow and send/recv.
    
    Send/Receive
    ------------
    ZFS send streams for datasets containing large dnodes cannot be received
    on pools that don't support the large_dnode feature. A send stream with
    large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
    unrecognized by an incompatible receiving pool so that the zfs receive
    will fail gracefully.
    
    While not implemented here, it may be possible to generate a
    backward-compatible send stream from a dataset containing large
    dnodes. The implementation may be tricky, however, because the send
    object record for a large dnode would need to be resized to a 512
    byte dnode, possibly kicking in a spill block in the process. This
    means we would need to construct a new SA layout and possibly
    register it in the SA layout object. The SA layout is normally just
    sent as an ordinary object record. But if we are constructing new
    layouts while generating the send stream we'd have to build the SA
    layout object dynamically and send it at the end of the stream.
    
    For sending and receiving between pools that do support large dnodes,
    the drr_object send record type is extended with a new field to store
    the dnode slot count. This field was repurposed from unused padding
    in the structure.
    
    ZIL Replay
    ----------
    The dnode slot count is stored in the uppermost 8 bits of the lr_foid
    field. The bits were unused as the object id is currently capped at
    48 bits.
    
    Resizing Dnodes
    ---------------
    It should be possible to resize a dnode when it is dirtied if the
    current dnodesize dataset property differs from the dnode's size, but
    this functionality is not currently implemented. Clearly a dnode can
    only grow if there are sufficient contiguous unused slots in the
    dnode block, but it should always be possible to shrink a dnode.
    Growing dnodes may be useful to reduce fragmentation in a pool with
    many spill blocks in use. Shrinking dnodes may be useful to allow
    sending a dataset to a pool that doesn't support the large_dnode
    feature.
    
    Feature Reference Counting
    --------------------------
    The reference count for the large_dnode pool feature tracks the
    number of datasets that have ever contained a dnode of size larger
    than 512 bytes. The first time a large dnode is created in a dataset
    the dataset is converted to an extensible dataset. This is a one-way
    operation and the only way to decrement the feature count is to
    destroy the dataset, even if the dataset no longer contains any large
    dnodes. The complexity of reference counting on a per-dnode basis was
    too high, so we chose to track it on a per-dataset basis similarly to
    the large_block feature.
    
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #3542
    nedbass authored and behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    50c957f View commit details
    Browse the repository at this point in the history
  7. Sync DMU_BACKUP_FEATURE_* flags

    Flag 20 was used in OpenZFS as DMU_BACKUP_FEATURE_RESUMING.  The
    DMU_BACKUP_FEATURE_LARGE_DNODE flag must be shifted to 21 and
    then reserved in the upstream OpenZFS implementation.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Ned Bass <bass6@llnl.gov>
    Closes #4795
    behlendorf committed Jun 24, 2016
    Configuration menu
    Copy the full SHA
    669cf0a View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2016

  1. OpenZFS 2605, 6980, 6902

    2605 want to resume interrupted zfs send
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
    Reviewed by: Xin Li <delphij@freebsd.org>
    Reviewed by: Arne Jansen <sensille@gmx.net>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/2605
    OpenZFS-commit: openzfs/openzfs@9c3fd12
    
    6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>
    Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6980
    OpenZFS-commit: openzfs/openzfs@ea4a67f
    
    Porting notes:
    - All rsend and snapshop tests enabled and updated for Linux.
    - Fix misuse of input argument in traverse_visitbp().
    - Fix ISO C90 warnings and errors.
    - Fix gcc 'missing braces around initializer' in
      'struct send_thread_arg to_arg =' warning.
    - Replace 4 argument fletcher_4_native() with 3 argument version,
      this change was made in OpenZFS 4185 which has not been ported.
    - Part of the sections for 'zfs receive' and 'zfs send' was
      rewritten and reordered to approximate upstream.
    - Fix mktree xattr creation, 'user.' prefix required.
    - Minor fixes to newly enabled test cases
    - Long holds for volumes allowed during receive for minor registration.
    ahrens authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    47dfff3 View commit details
    Browse the repository at this point in the history
  2. OpenZFS 6051 - lzc_receive: allow the caller to read the begin record

    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Approved by: Robert Mustacchi <rm@joyent.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6051
    OpenZFS-commit: openzfs/openzfs@620f322
    behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    fd41e93 View commit details
    Browse the repository at this point in the history
  3. OpenZFS 6393 - zfs receive a full send as a clone

    Authored by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6394
    OpenZFS-commit: openzfs/openzfs@68ecb2e
    pcd1193182 authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    e6d3a84 View commit details
    Browse the repository at this point in the history
  4. OpenZFS 6536 - zfs send: want a way to disable setting of DRR_FLAG_FR…

    …EERECORDS
    
    Authored by: Andrew Stormont <astormont@racktopsystems.com>
    Reviewed by: Anil Vijarnia <avijarnia@racktopsystems.com>
    Reviewed by: Kim Shrier <kshrier@racktopsystems.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6536
    OpenZFS-commit: openzfs/openzfs@880094b
    andy-js authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    b607405 View commit details
    Browse the repository at this point in the history
  5. OpenZFS 6738 - zfs send stream padding needs documentation

    Authored by: Eli Rosenthal <eli.rosenthal@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Dan McDonald <danmcd@omniti.com>
    Approved by: Robert Mustacchi <rm@joyent.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6738
    OpenZFS-commit: openzfs/openzfs@c20404ff
    Eli Rosenthal authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    f8866f8 View commit details
    Browse the repository at this point in the history
  6. OpenZFS 4986 - receiving replication stream fails if any snapshot exc…

    …eeds refquota
    
    Authored by: Dan McDonald <danmcd@omniti.com>
    Reviewed by: John Kennedy <john.kennedy@delphix.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Approved by: Gordon Ross <gordon.ross@nexenta.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/4986
    OpenZFS-commit: openzfs/openzfs@5878fad
    Dan McDonald authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    671c935 View commit details
    Browse the repository at this point in the history
  7. OpenZFS 6562 - Refquota on receive doesn't account for overage

    Authored by: Dan McDonald <danmcd@omniti.com>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gwr@nexenta.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6562
    OpenZFS-commit: openzfs/openzfs@5f7a8e6
    Dan McDonald authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    8c62a0d View commit details
    Browse the repository at this point in the history
  8. Implement zfs_ioc_recv_new() for OpenZFS 2605

    Adds ZFS_IOC_RECV_NEW for resumable streams and preserves the legacy
    ZFS_IOC_RECV user/kernel interface.  The new interface supports all
    stream options but is currently only used for resumable streams.
    This way updated user space utilities will interoperate with older
    kernel modules.
    
    ZFS_IOC_RECV_NEW is modeled after the existing ZFS_IOC_SEND_NEW
    handler.  Non-Linux OpenZFS platforms have opted to change the
    legacy interface in an incompatible fashion instead of adding a
    new ioctl.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    43e52ed View commit details
    Browse the repository at this point in the history
  9. OpenZFS 6314 - buffer overflow in dsl_dataset_name

    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
    Approved by: Dan McDonald <danmcd@omniti.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    OpenZFS-issue: https://www.illumos.org/issues/6314
    OpenZFS-commit: openzfs/openzfs@d6160ee
    ikozhukhov authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    eca7b76 View commit details
    Browse the repository at this point in the history
  10. OpenZFS 6876 - Stack corruption after importing a pool with a too-lon…

    …g name
    
    Reviewed by: Prakash Surya <prakash.surya@delphix.com>
    Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
    Reviewed by: George Wilson <george.wilson@delphix.com>
    Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
    Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
    
    Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking
    for trouble. We should check every dataset on import, using a 1024 byte
    buffer and checking each time to see if the dataset's new name is longer
    than 256 bytes.
    
    OpenZFS-issue: https://www.illumos.org/issues/6876
    OpenZFS-commit: openzfs/openzfs@ca8674e
    pcd1193182 authored and behlendorf committed Jun 28, 2016
    Configuration menu
    Copy the full SHA
    d1d19c7 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2016

  1. Vectorized fletcher_4 must be 128-bit aligned

    The fletcher_4_native() and fletcher_4_byteswap() functions may only
    safely use the vectorized implementations when the buffer is 128-bit
    aligned.  This is because both the AVX2 and SSE implementations process
    four 32-bit words per iterations.  Fallback to the scalar implementation
    which only processes a single 32-bit word for unaligned buffers.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Issue #4330
    behlendorf committed Jun 29, 2016
    Configuration menu
    Copy the full SHA
    0dab2e8 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'illumos-2605'

    Adds support for resuming interrupted zfs send streams and include
    all related send/recv bug fixes from upstream OpenZFS.
    
    Unlike the upstream implementation this branch does not change
    the existing ioctl interface.  Instead a new ZFS_IOC_RECV_NEW ioctl
    was added to support resuming zfs send streams.  This was done by
    applying the original upstream patch and then reverting the ioctl
    changes in a follow up patch.  For this reason there are a handful
    on commits between the relevant patches on this branch which are
    not interoperable.  This was done to make it easier to extract
    the new ZFS_IOC_RECV_NEW and submit it upstream.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4742
    behlendorf committed Jun 29, 2016
    Configuration menu
    Copy the full SHA
    5c27b29 View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2016

  1. Allow building with CFLAGS="-O0"

    If compiled with -O0, gcc doesn't do any stack frame coalescing
    and -Wframe-larger-than=1024 is triggered in debug mode.
    Starting with gcc 4.8, new opt level -Og is introduced for debugging, which
    does not trigger this warning.
    
    Fix bench zio size, using SPA_OLD_MAXBLOCKSHIFT
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4799
    ironMann authored and behlendorf committed Jul 11, 2016
    Configuration menu
    Copy the full SHA
    590c9a0 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2016

  1. Fix get_zfs_sb race with concurrent umount

    Certain ioctl operations will call get_zfs_sb, which will holds an active
    count on sb without checking whether it's active or not. This will result
    in use-after-free. We fix this by using atomic_inc_not_zero to make sure
    we got an active sb.
    
    P1                                          P2
    ---                                         ---
    deactivate_locked_super(): s_active = 0
                                                zfs_sb_hold()
                                                ->get_zfs_sb(): s_active = 1
    ->zpl_kill_sb()
    -->zpl_put_super()
    --->zfs_umount()
    ---->zfs_sb_free(zsb)
                                                zfs_sb_rele(zsb)
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    061460d View commit details
    Browse the repository at this point in the history
  2. Don't allow accessing XATTR via export handle

    Allow accessing XATTR through export handle is a very bad idea. It
    would allow user to write whatever they want in fields where they
    otherwise could not.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4828
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    7938c2a View commit details
    Browse the repository at this point in the history
  3. Fix Large kmem_alloc in vdev_metaslab_init

    This allocation can go way over 1MB, so we should use vmem_alloc
    instead of kmem_alloc.
    
      Large kmem_alloc(1430784, 0x1000), please file an issue...
      Call Trace:
       [<ffffffffa0324aff>] ? spl_kmem_zalloc+0xef/0x160 [spl]
       [<ffffffffa17d0c8d>] ? vdev_metaslab_init+0x9d/0x1f0 [zfs]
       [<ffffffffa17d46d0>] ? vdev_load+0xc0/0xd0 [zfs]
       [<ffffffffa17d4643>] ? vdev_load+0x33/0xd0 [zfs]
       [<ffffffffa17c0004>] ? spa_load+0xfc4/0x1b60 [zfs]
       [<ffffffffa17c1838>] ? spa_tryimport+0x98/0x430 [zfs]
       [<ffffffffa17f28b1>] ? zfs_ioc_pool_tryimport+0x41/0x80 [zfs]
       [<ffffffffa17f5669>] ? zfsdev_ioctl+0x4a9/0x4e0 [zfs]
       [<ffffffff811bacdf>] ? do_vfs_ioctl+0x2cf/0x4b0
       [<ffffffff811baf41>] ? SyS_ioctl+0x81/0xa0
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4752
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    bffb68a View commit details
    Browse the repository at this point in the history
  4. Add configure result for xattr_handler

    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4828
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    d470101 View commit details
    Browse the repository at this point in the history
  5. fh_to_dentry should return ESTALE when generation mismatch

    When generation mismatch, it usually means the file pointed by the file handle
    was deleted. We should return ESTALE to indicate this. We return ENOENT in
    zfs_vget since zpl_fh_to_dentry will convert it to ESTALE.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4828
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    6c25306 View commit details
    Browse the repository at this point in the history
  6. xattr dir doesn't get purged during iput

    We need to set inode->i_nlink to zero so iput will purge it. Without this, it
    will get purged during shrink cache or umount, which would likely result in
    deadlock due to zfs_zget waiting forever on its children which are in the
    dispose_list of the same thread.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chris Dunlop <chris@onthe.net.au>
    Issue #4359
    Issue #3508
    Issue #4413
    Issue #4827
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    ddae16a View commit details
    Browse the repository at this point in the history
  7. Kill zp->z_xattr_parent to prevent pinning

    zp->z_xattr_parent will pin the parent. This will cause huge issue
    when unlink a file with xattr. Because the unlinked file is pinned, it
    will never get purged immediately. And because of that, the xattr
    stuff will never be marked as unlinked. So the whole unlinked stuff
    will stay there until shrink cache or umount.
    
    This change partially reverts e89260a.  This is safe because only the
    zp->z_xattr_parent optimization is removed, zpl_xattr_security_init()
    is still called from the zpl outside the inode lock.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chris Dunlop <chris@onthe.net.au>
    Issue #4359
    Issue #3508
    Issue #4413
    Issue #4827
    Chunwei Chen authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    31b6111 View commit details
    Browse the repository at this point in the history
  8. Fix RAIDZ_TEST tests

    Remove stray trailing } which prevented the raidz stress tests from
    running in-tree.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    62b2d54 View commit details
    Browse the repository at this point in the history
  9. Fix PANIC: metaslab_free_dva(): bad DVA X:Y:Z

    The following scenario can result in garbage in the dn_spill field.
    The db->db_blkptr must be set to NULL when DNODE_FLAG_SPILL_BLKPTR
    is clear to ensure the dn_spill field is cleared.
    
    Current txg = A.
    * A new spill buffer is created. Its dbuf is initialized with
      db_blkptr = NULL and it's dirtied.
    
    Current txg = B.
    * The spill buffer is modified. It's marked as dirty in this txg.
    * Additional changes make the spill buffer unnecessary because the
      xattr fits into the bonus buffer, so it's removed. The dbuf is
      undirtied in this txg, but it's still referenced and cannot be
      destroyed.
    
    Current txg = C.
    * Starts syncing of txg A
    * dbuf_sync_leaf() is called for the spill buffer. Since db_blkptr
      is NULL, dbuf_check_blkptr() is called.
    * The dbuf starts being written and it reaches the ready state
      (not done yet).
    * A new change makes the spill buffer necessary again.
      sa_build_layouts() ends up calling dbuf_find() to locate the
      dbuf.  It finds the old dbuf because it has not been destroyed yet
      (it will be destroyed when the previous write is done and there
      are no more references). The old dbuf has db_blkptr != NULL.
    * txg A write is complete and the dbuf released. However it's still
      referenced, so it's not destroyed.
    
    Current txg = D.
    * Starts syncing of txg B
    * dbuf_sync_leaf() is called for the bonus buffer. Its contents are
      directly copied into the dnode, overwriting the blkptr area because,
      in txg B, the bonus buffer was big enough to hold the entire xattr.
    * At this point, the db_blkptr of the spill buffer used in txg C
      gets corrupted.
    
    Signed-off-by: Peng <peng.hse@xtaotech.com>
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #3937
    hsepeng authored and behlendorf committed Jul 12, 2016
    Configuration menu
    Copy the full SHA
    81edd3e View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2016

  1. Fix handling of errors nvlist in zfs_ioc_recv_new()

    zfs_ioc_recv_impl() is changed to always allocate the 'errors'
    nvlist, its callers are responsible for freeing it.
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4829
    ironMann authored and behlendorf committed Jul 13, 2016
    Configuration menu
    Copy the full SHA
    1bf3bf0 View commit details
    Browse the repository at this point in the history
  2. Add RAID-Z routines for SSE2 instruction set, in x86_64 mode.

    The patch covers low-end and older x86 CPUs.  Parity generation is
    equivalent to SSSE3 implementation, but reconstruction is somewhat
    slower.  Previous 'sse' implementation is renamed to 'ssse3' to
    indicate highest instruction set used.
    
    Benchmark results:
    scalar_rec_p                    4    720476442
    scalar_rec_q                    4    187462804
    scalar_rec_r                    4    138996096
    scalar_rec_pq                   4    140834951
    scalar_rec_pr                   4    129332035
    scalar_rec_qr                   4    81619194
    scalar_rec_pqr                  4    53376668
    
    sse2_rec_p                      4    2427757064
    sse2_rec_q                      4    747120861
    sse2_rec_r                      4    499871637
    sse2_rec_pq                     4    522403710
    sse2_rec_pr                     4    464632780
    sse2_rec_qr                     4    319124434
    sse2_rec_pqr                    4    205794190
    
    ssse3_rec_p                     4    2519939444
    ssse3_rec_q                     4    1003019289
    ssse3_rec_r                     4    616428767
    ssse3_rec_pq                    4    706326396
    ssse3_rec_pr                    4    570493618
    ssse3_rec_qr                    4    400185250
    ssse3_rec_pqr                   4    377541245
    
    original_rec_p                  4    691658568
    original_rec_q                  4    195510948
    original_rec_r                  4    26075538
    original_rec_pq                 4    103087368
    original_rec_pr                 4    15767058
    original_rec_qr                 4    15513175
    original_rec_pqr                4    10746357
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4783
    ironMann authored and behlendorf committed Jul 13, 2016
    Configuration menu
    Copy the full SHA
    ae25d22 View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2016

  1. Enable zpool_upgrade test cases

    Creating the pool in a striped rather than mirrored configuration
    provides enough space for all upgrade tests to run.  Test case
    zpool_upgrade_007_pos still fails and must be investigated so
    it has been left disabled.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4852
    behlendorf committed Jul 14, 2016
    Configuration menu
    Copy the full SHA
    8d9e124 View commit details
    Browse the repository at this point in the history
  2. Prevent null dereferences when accessing dbuf kstat

    In arc_buf_info(), the arc_buf_t may have no header.  If not, don't try
    to fetch the arc buffer stats and instead just zero them.
    
    The null dereferences were observed while accessing the dbuf kstat with
    awk on a system in which millions of small files were being created in
    order to overflow the system's metadata limit.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Closes #4837
    dweeezil authored and behlendorf committed Jul 14, 2016
    Configuration menu
    Copy the full SHA
    8887c7d View commit details
    Browse the repository at this point in the history
  3. Fix dbuf_stats_hash_table_data race

    Dropping DBUF_HASH_MUTEX when walking the hash list is unsafe. The dbuf
    can be freed at any time.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4846
    Chunwei Chen authored and behlendorf committed Jul 14, 2016
    Configuration menu
    Copy the full SHA
    02de3e3 View commit details
    Browse the repository at this point in the history
  4. Use native inode->i_nlink instead of znode->z_links

    A mostly mechanical change, taking into account i_nlink is 32 bits vs ZFS's
    64 bit on-disk link count.
    
    We revert "xattr dir doesn't get purged during iput" (ddae16a) as this is a
    more Linux-integrated fix for the same issue.
    
    In addition, setting the initial link count on a new node has been changed
    from setting one less than required in zfs_mknode() then incrementing to the
    correct count in zfs_link_create() (which was somewhat bizarre in the first
    place), to setting the correct count in zfs_mknode() and not incrementing it
    in zfs_link_create(). This both means we no longer set the link count in
    sa_bulk_update() twice (once for the initial incorrect count then again for
    the correct count), as well as adhering to the Linux requirement of not
    incrementing a zero link count without I_LINKABLE (see linux commit
    f4e0c30c).
    
    Signed-off-by: Chris Dunlop <chris@onthe.net.au>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Closes #4838
    Issue #227
    chrisrd authored and behlendorf committed Jul 14, 2016
    Configuration menu
    Copy the full SHA
    dfbc863 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2016

  1. Implementation of SSE optimized Fletcher-4

    Builds off of 1eeb456 (Implementation of AVX2 optimized Fletcher-4)
    This commit adds another implementation of the Fletcher-4 algorithm.
    It is automatically selected at module load if it benchmarks higher
    than all other available implementations.
    
    The module benchmark was also amended to analyze the performance of
    the byteswap-ed version of Fletcher-4, as well as the non-byteswaped
    version. The average performance of the two is used to select the
    the fastest implementation available on the host system.
    
    Adds a pair of fields to an existing zcommon module parameter:
    -  zfs_fletcher_4_impl (str)
        "sse2"    - new SSE2 implementation if available
        "ssse3"   - new SSSE3 implementation if available
    
    Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4789
    tj90241 authored and behlendorf committed Jul 15, 2016
    Configuration menu
    Copy the full SHA
    35a76a0 View commit details
    Browse the repository at this point in the history
  2. Fix filesystem destroy with receive_resume_token

    It is possible that the given DS may have hidden child (%recv)
    datasets - "leftovers" resulting from the previously interrupted
    'zfs receieve'.  Try to remove the hidden child (%recv) and after
    that try to remove the target dataset.   If the hidden child
    (%recv) does not exist the original error (EEXIST) will be returned.
    
    Signed-off-by: Roman Strashkin <roman.strashkin@nexenta.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4818
    rstrlcpy authored and behlendorf committed Jul 15, 2016
    Configuration menu
    Copy the full SHA
    1b87e0f View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2016

  1. Prevent segfaults in SSE optimized Fletcher-4

    In some cases, the compiler was not respecting the GNU aligned
    attribute for stack variables in 35a76a0. This was resulting in
    a segfault on CentOS 6.7 hosts using gcc 4.4.7-17.  This issue
    was fixed in gcc 4.6.
    
    To prevent this from occurring, use unaligned loads and stores
    for all stack and global memory references in the SSE optimized
    Fletcher-4 code.
    
    Disable zimport testing against master where this flaw exists:
    
    TEST_ZIMPORT_VERSIONS="installed"
    
    Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4862
    tj90241 authored and behlendorf committed Jul 19, 2016
    Configuration menu
    Copy the full SHA
    3d11ecb View commit details
    Browse the repository at this point in the history
  2. Update arc_summary.py for prefetch changes

    Commit 7f60329 removed several kstats which arc_summary.py read.
    Remove these kstats from arc_summary.py in the same way this was
    handled in FreeNAS.
    
    FreeNAS-commit: truenas/middleware@3901f73
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4695
    behlendorf committed Jul 19, 2016
    Configuration menu
    Copy the full SHA
    b756ff2 View commit details
    Browse the repository at this point in the history
  3. Wait iput_async before evict_inodes to prevent race

    Wait for iput_async before entering evict_inodes in
    generic_shutdown_super. The reason we must finish before
    evict_inodes is when lazytime is on, or when zfs_purgedir calls
    zfs_zget, iput would bump i_count from 0 to 1. This would race
    with the i_count check in evict_inodes.  This means it could
    destroy the inode while we are still using it.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4854
    Chunwei Chen authored and behlendorf committed Jul 19, 2016
    Configuration menu
    Copy the full SHA
    1d9b3bd View commit details
    Browse the repository at this point in the history
  4. Fixes and enhancements of SIMD raidz parity

    - Implementation lock replaced with atomic variable
    
    - Trailing whitespace is removed from user specified parameter, to enhance
    experience when using commands that add newline, e.g. `echo`
    
    - raidz_test: remove dependency on `getrusage()` and RUSAGE_THREAD, Issue #4813
    
    - silence `cppcheck` in vdev_raidz, partial solution of Issue #1392
    
    - Minor fixes and cleanups
    
    - Enable use of original parity methods in [fastest] configuration.
    New opaque original ops structure, representing native methods, is added
    to supported raidz methods. Original parity methods are executed if selected
    implementation has NULL fn pointer.
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4813
    Issue #1392
    ironMann authored and behlendorf committed Jul 19, 2016
    Configuration menu
    Copy the full SHA
    c9187d8 View commit details
    Browse the repository at this point in the history
  5. RAIDZ parity kstat rework

    Print table with speed of methods for each implementation.
    Last line describes contents of [fastest] selection.
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4860
    ironMann authored and behlendorf committed Jul 19, 2016
    Configuration menu
    Copy the full SHA
    26a08b5 View commit details
    Browse the repository at this point in the history

Commits on Jul 20, 2016

  1. Fix NULL pointer in zfs_preumount from 1d9b3bd

    When zfs_domount fails zsb will be freed, and its caller
    mount_nodev/get_sb_nodev will do deactivate_locked_super and calls into
    zfs_preumount.
    
    In order to make sure we don't touch any nonexistent stuff, we must make sure
    s_fs_info is NULL in the fail path so zfs_preumount can easily check that.
    
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4867
    Issue #4854
    Chunwei Chen authored and behlendorf committed Jul 20, 2016
    Configuration menu
    Copy the full SHA
    be88e73 View commit details
    Browse the repository at this point in the history
  2. Illumos Crypto Port module added to enable native encryption in zfs

    A port of the Illumos Crypto Framework to a Linux kernel module (found
    in module/icp). This is needed to do the actual encryption work. We cannot
    use the Linux kernel's built in crypto api because it is only exported to
    GPL-licensed modules. Having the ICP also means the crypto code can run on
    any of the other kernels under OpenZFS. I ended up porting over most of the
    internals of the framework, which means that porting over other API calls (if
    we need them) should be fairly easy. Specifically, I have ported over the API
    functions related to encryption, digests, macs, and crypto templates. The ICP
    is able to use assembly-accelerated encryption on amd64 machines and AES-NI
    instructions on Intel chips that support it. There are place-holder
    directories for similar assembly optimizations for other architectures
    (although they have not been written).
    
    Signed-off-by: Tom Caputi <tcaputi@datto.com>
    Signed-off-by: Tony Hutter <hutter2@llnl.gov>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4329
    Tom Caputi authored and behlendorf committed Jul 20, 2016
    Configuration menu
    Copy the full SHA
    0b04990 View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2016

  1. Fix for compilation error when using the kernel's CONFIG_LOCKDEP

    Signed-off-by: Tom Caputi <tcaputi@datto.com>
    Signed-off-by: Chris Dunlop <chris@onthe.net.au>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4329
    Tom Caputi authored and behlendorf committed Jul 22, 2016
    Configuration menu
    Copy the full SHA
    f4bc1bb View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2016

  1. zloop: print backtrace from core files

    Find the core file by using `/proc/sys/kernel/core_pattern`
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4874
    ironMann authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    20da056 View commit details
    Browse the repository at this point in the history
  2. Fix for metaslab_fastwrite_unmark() assert failure

    Currently there is an issue where metaslab_fastwrite_unmark() unmarks
    fastwrites on vdev_t's that have never had fastwrites marked on them.
    The 'fastwrite mark' is essentially a count of outstanding bytes that
    will be written to a vdev and is used in syncing context. The problem
    stems from the fact that the vdev_pending_fastwrite field is not being
    transferred over when replacing a top-level vdev. As a result, the
    metaslab is marked for fastwrite on the old vdev and unmarked on the
    new one, which brings the fastwrite count below zero. This fix simply
    assigns vdev_pending_fastwrite from the old vdev to the new one so
    this count is not lost.
    
    Signed-off-by: Tom Caputi <tcaputi@datto.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4267
    Tom Caputi authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    77943bc View commit details
    Browse the repository at this point in the history
  3. Check whether the kernel supports i_uid/gid_read/write helpers

    Since the concept of a kuid and the need to translate from it to
    ordinary integer type was added in kernel version 3.5 implement necessary
    plumbing to be able to detect this condition during compile time. If
    the kernel doesn't support the kuid then just fall back to directly
    accessing the respective struct inode's members
    
    Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4685
    Issue #227
    Nikolay Borisov authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    82a1b2d View commit details
    Browse the repository at this point in the history
  4. Remove znode's z_uid/z_gid member

    Remove duplicate z_uid/z_gid member which are also held in the
    generic vfs inode struct. This is done by first removing the members
    from struct znode and then using the KUID_TO_SUID/KGID_TO_SGID
    macros to access the respective member from struct inode. In cases
    where the uid/gids are being marshalled from/to disk, use the newly
    introduced zfs_(uid|gid)_(read|write) functions to properly
    save the uids rather than the internal kernel representation.
    
    Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4685
    Issue #227
    Nikolay Borisov authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    2c6abf1 View commit details
    Browse the repository at this point in the history
  5. Fix uninitialized variable in avl_add()

    Silence the following warning when compiling with gcc 5.4.0.
    Specifically gcc (Ubuntu 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609.
    
    module/avl/avl.c: In function ‘avl_add’:
    module/avl/avl.c:647:2: warning: ‘where’ may be used uninitialized
        in this function [-Wmaybe-uninitialized]
      avl_insert(tree, new_node, where);
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    273ff9b View commit details
    Browse the repository at this point in the history
  6. Fix sync behavior for disk vdevs

    Prior to b39c22b, which was first generally available in the 0.6.5
    release as b39c22b, ZoL never actually submitted synchronous read or write
    requests to the Linux block layer.  This means the vdev_disk_dio_is_sync()
    function had always returned false and, therefore, the completion in
    dio_request_t.dr_comp was never actually used.
    
    In b39c22b, synchronous ZIO operations were translated to synchronous
    BIO requests in vdev_disk_io_start().  The follow-on commits 5592404 and
    aa159af fixed several problems introduced by b39c22b.  In particular,
    5592404 introduced the new flag parameter "wait" to __vdev_disk_physio()
    but under ZoL, since vdev_disk_physio() is never actually used, the wait
    flag was always zero so the new code had no effect other than to cause
    a bug in the use of the dio_request_t.dr_comp which was fixed by aa159af.
    
    The original rationale for introducing synchronous operations in b39c22b
    was to hurry certains requests through the BIO layer which would have
    otherwise been subject to its unplug timer which would increase the
    latency.  This behavior of the unplug timer, however, went away during the
    transition of the plug/unplug system between kernels 2.6.32 and 2.6.39.
    
    To handle the unplug timer behavior on 2.6.32-2.6.35 kernels the
    BIO_RW_UNPLUG flag is used as a hint to suppress the plugging behavior.
    
    For kernels 2.6.36-2.6.38, the REQ_UNPLUG macro will be available and
    ise used for the same purpose.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4858
    dweeezil authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    e6603b7 View commit details
    Browse the repository at this point in the history
  7. Limit the amount of dnode metadata in the ARC

    Metadata-intensive workloads can cause the ARC to become permanently
    filled with dnode_t objects as they're pinned by the VFS layer.
    Subsequent data-intensive workloads may only benefit from about
    25% of the potential ARC (arc_c_max - arc_meta_limit).
    
    In order to help track metadata usage more precisely, the other_size
    metadata arcstat has replaced with dbuf_size, dnode_size and bonus_size.
    
    The new zfs_arc_dnode_limit tunable, which defaults to 10% of
    zfs_arc_meta_limit, defines the minimum number of bytes which is desirable
    to be consumed by dnodes.  Attempts to evict non-metadata will trigger
    async prune tasks if the space used by dnodes exceeds this limit.
    
    The new zfs_arc_dnode_reduce_percent tunable specifies the amount by
    which the excess dnode space is attempted to be pruned as a percentage of
    the amount by which zfs_arc_dnode_limit is being exceeded.  By default,
    it tries to unpin 10% of the dnodes.
    
    The problem of dnode metadata pinning was observed with the following
    testing procedure (in this example, zfs_arc_max is set to 4GiB):
    
        - Create a large number of small files until arc_meta_used exceeds
          arc_meta_limit (3GiB with default tuning) and arc_prune
          starts increasing.
    
        - Create a 3GiB file with dd.  Observe arc_mata_used.  It will still
          be around 3GiB.
    
        - Repeatedly read the 3GiB file and observe arc_meta_limit as before.
          It will continue to stay around 3GiB.
    
    With this modification, space for the 3GiB file is gradually made
    available as subsequent demands on the ARC are made.  The previous behavior
    can be restored by setting zfs_arc_dnode_limit to the same value as the
    zfs_arc_meta_limit.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4345
    Issue #4512
    Issue #4773
    Closes #4858
    dweeezil authored and behlendorf committed Jul 25, 2016
    Configuration menu
    Copy the full SHA
    25458cb View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2016

  1. Fixes for issues found with cppcheck tool

    The patch fixes small number of errors/false positives reported by `cppcheck`,
    static analysis tool for C/C++.
    
    cppcheck 1.72
    
    $ cppcheck . --force --quiet
    [cmd/zfs/zfs_main.c:4444]: (error) Possible null pointer dereference: who_perm
    [cmd/zfs/zfs_main.c:4445]: (error) Possible null pointer dereference: who_perm
    [cmd/zfs/zfs_main.c:4446]: (error) Possible null pointer dereference: who_perm
    [cmd/zpool/zpool_iter.c:317]: (error) Uninitialized variable: nvroot
    [cmd/zpool/zpool_vdev.c:1526]: (error) Memory leak: child
    [lib/libefi/rdwr_efi.c:1118]: (error) Memory leak: efi_label
    [lib/libuutil/uu_misc.c:207]: (error) va_list 'args' was opened but not closed by va_end().
    [lib/libzfs/libzfs_import.c:1554]: (error) Dangerous usage of 'diskname' (strncpy doesn't always null-terminate it).
    [lib/libzfs/libzfs_sendrecv.c:3279]: (error) Dereferencing 'cp' after it is deallocated / released
    [tests/zfs-tests/cmd/file_write/file_write.c:154]: (error) Possible null pointer dereference: operation
    [tests/zfs-tests/cmd/randfree_file/randfree_file.c:90]: (error) Memory leak: buf
    [cmd/zinject/zinject.c:1068]: (error) Uninitialized variable: dataset
    [module/icp/io/sha2_mod.c:698]: (error) Uninitialized variable: blocks_per_int64
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #1392
    ironMann authored and behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    a64f903 View commit details
    Browse the repository at this point in the history
  2. Multi-thread 'zpool import' for blkid

    Commit 519129f added support to multi-thread 'zpool import' for
    the case where block devices are scanned for under /dev/.  This
    commit generalizes that logic and applies it to the case where
    device names are acquired from libblkid.
    
    The zpool_find_import_scan() and zpool_find_import_blkid()
    functions create an AVL tree containing each device name.  Each
    entry in this tree is dispatched to a taskq where the function
    zpool_open_func() validates the device by opening it and reading
    the label.  This may result in additional entries being added
    to the tree and those device paths being verified.
    
    This is largely how the upstream OpenZFS code behaves but due to
    significant differences the non-Linux code has been dropped for
    readability.  Additionally, this code makes use of taskqs and
    kmutexs which are normally not available to the command line tools.
    Special care has been taken to allow their use in the import
    functions.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
    Closes #4794
    behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    8a39aba View commit details
    Browse the repository at this point in the history
  3. void integer overflow on computation of refquota_slack

    DMU_MAX_ACCESS should be cast to a uint64_t otherwise the
    multiplication of DMU_MAX_ACCESS with spa_asize_inflation will
    be 32 bit and may lead to an overflow. Currently DMU_MAX_ACCESS
    is 64 * 1024 * 1024, so spa_asize_inflation being 64 or more will
    lead to an overflow.
    
    Found by static analysis with CoverityScan 0.8.5
    
    CID 150942 (#1 of 1): Unintentional integer overflow
      (OVERFLOW_BEFORE_WIDEN)
    overflow_before_widen: Potentially overflowing expression
      67108864 * spa_asize_inflation with type int (32 bits, signed)
      is evaluated using 32-bit arithmetic, and then used in a context
      that expects an expression of type uint64_t (64 bits, unsigned).
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4889
    Colin Ian King authored and behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    bf18fd8 View commit details
    Browse the repository at this point in the history
  4. Enable history test cases

    Updated test case history_001_pos.ksh so it can run in tree.  The
    original test case assumed /usr/sbin/zfs and /usr/sbin/zpool were
    the only valid locations for these utilities.  The same modification
    has already been made too history_common.kshlib.
    
    The only other failing test case was history_010_pos and that was
    the result of the ":linux" suffix not being appended when checking
    the long output in the test case.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4882
    behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    a0cacb7 View commit details
    Browse the repository at this point in the history
  5. Fix zdb crash with 4K-only devices

    Here's the problem - on 4K native devices in userland on
    Linux using O_DIRECT, buffers must be 4K aligned or I/O
    will fail with EINVAL, causing zdb (and others) to coredump.
    Since userland probably doesn't need optimized buffer caches,
    we just force 4K alignment on everything.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Closes #4479
    behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    fcf64f4 View commit details
    Browse the repository at this point in the history
  6. txg visibility code should not execute under tc_open_lock

    The memory allocation and locking in `spa_txg_history_*()` can
    potentially block txg_hold_open for arbitrarily long periods of time.
    
    Signed-off-by: Richard Yao <ryao@gentoo.org>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4333
    ryao authored and behlendorf committed Jul 27, 2016
    Configuration menu
    Copy the full SHA
    f26b4b3 View commit details
    Browse the repository at this point in the history

Commits on Jul 29, 2016

  1. Linux 4.8 compat: submit_bio()

    The rw argument has been removed from submit_bio/submit_bio_wait.
    Callers are now expected to set bio->bi_rw instead of passing it
    in.  See torvalds/linux@4e49ea4a for
    complete details.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4892
    Issue #4899
    behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    bbb1b6c View commit details
    Browse the repository at this point in the history
  2. Linux 4.8 compat: REQ_PREFLUSH

    The REQ_FLUSH flag was renamed REQ_PREFLUSH to avoid confusion with
    REQ_OP_FLUSH.  See torvalds/linux@28a8f0d3
    for complete details.
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Issue #4892
    Issue #4899
    behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    76e5f6f View commit details
    Browse the repository at this point in the history
  3. Linux 4.8 compat: REQ_OP and bio_set_op_attrs()

    New REQ_OP_* definitions have been introduced to separate the
    WRITE, READ, and DISCARD operations from the flags.  This included
    changing the encoding of bi_rw.  It places REQ_OP_* in high order
    bits and other stuff in low order bits.  This encoding is done
    through the new helper function bio_set_op_attrs.  For complete
    details refer to:
    
    torvalds/linux@f215082
    torvalds/linux@4e1b2d5
    
    Signed-off-by: Tim Chase <tim@chase2k.com>
    Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4892
    Closes #4899
    Chunwei Chen authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    3b86aeb View commit details
    Browse the repository at this point in the history
  4. Fix zfs_allow_log_destroy() NULL dereference

    In zfs_ioc_log_history() function the tsd_set() function is called
    with NULL which causes the zfs_allow_log_destroy() to be run.  In
    this case the passed value will be NULL.  This is normally entirely
    safe because strfree() maps directly to kfree() which may be passed
    a NULL.  However, since alternate implementations of strfree() may
    not handle this gracefully add a check for NULL.
    
    Observed under an embedded Linux 2.6.32.41 kernel running the
    automated testing while running the ZFS Test Suite.
    
    Signed-off-by: caoxuewen <cao.xuewen@zte.com.cn>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4872
    heary-cao authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    9f3d140 View commit details
    Browse the repository at this point in the history
  5. Unify license of icp module with the rest of zfs

    The newly added icp module uses a hardcoded value of CDDL for the license,
    however in local development one might want to change that to something
    else in order to facilitate compiling against lock debugging enabled kernel.
    All modules of the zfs use the ZFS_META_LICNSE string which is replaced with
    the value held in the META file. One can modify the value in the META file
    once and then rerun the configure to have all modules' licenses changed.
    
    Change the icp module license string to be ZFS_META_LICENSE so that it
    falls under the same paradigm.
    
    Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4905
    Nikolay Borisov authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    e334e82 View commit details
    Browse the repository at this point in the history
  6. Move assignment of i_blkbits field

    Currently i_blkbits is always set to SPA_MINBLOCKSHIFT every time
    zfs_inode_update_impl is called. Since this value never changes
    move its assignment to at inode creation time.
    
    Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4906
    Nikolay Borisov authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    ba2fe6a View commit details
    Browse the repository at this point in the history
  7. libzfs: Fix missing va_end call on ENOSPC and EDQUOT cases

    The switch statement in function zfs_standard_error_fmt for the
    ENOSPC and EDQUOT cases returns immediately and unlike all other
    cases in the switch this does not perform the va_end call.
    
    Perform a break which ends up calling va_end rather than returning
    immediately.
    
    Found by static analysis with CoverityScan 0.8.5
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4900
    Colin Ian King authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    b264d9b View commit details
    Browse the repository at this point in the history
  8. libzfs_import.c: Uninitialized pointer read

    In zpool_find_import_scan: Reads an uninitialized pointer or
    its target Coverity #150966
    
    Found by static analysis with CoverityScan 0.8.5
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4897
    ironMann authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    78867a0 View commit details
    Browse the repository at this point in the history
  9. ztest: memory leaks reported by AddressSanitizer

    Leaks reported by using AddressSanitizer, GCC 6.1.0
    
    Direct leak of 4097 byte(s) in 1 object(s) allocated from:
        #1 0x414f73 in process_options cmd/ztest/ztest.c:721
    
    Direct leak of 5440 byte(s) in 17 object(s) allocated from:
        #1 0x41bfd5 in umem_alloc ../../lib/libspl/include/umem.h:88
        #2 0x41bfd5 in ztest_zap_parallel cmd/ztest/ztest.c:4659
        #3 0x4163a8 in ztest_execute cmd/ztest/ztest.c:5907
    
    Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Closes #4896
    ironMann authored and behlendorf committed Jul 29, 2016
    Configuration menu
    Copy the full SHA
    df053d6 View commit details
    Browse the repository at this point in the history