Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS io error when disks are in idle/standby/spindown mode #4713

Closed
johnkeates opened this issue May 29, 2016 · 134 comments
Closed

ZFS io error when disks are in idle/standby/spindown mode #4713

johnkeates opened this issue May 29, 2016 · 134 comments
Labels
Status: Inactive Not being actively updated Type: Defect Incorrect behavior (e.g. crash, hang)
Milestone

Comments

@johnkeates
Copy link

johnkeates commented May 29, 2016

Whenever one or more disks in one of my pools is sleeping because it was idle ZFS (via ZED) spams me with IO errors (via email, because that's how I set it up).

It's always this kind of error with only the vpath, vguid and eid changing:

ZFS has detected an io error:

  eid: 15
class: io
 host: clava
 time: 2016-05-29 23:21:18+0200
vtype: disk
vpath: /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249-part1
vguid: 0x0094F35F53B1888B
cksum: 0
 read: 0
write: 0
 pool: green pool

dmesg shows:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00
[ 3647.748388] blk_update_request: I/O error, dev sdd, sector 1688054984
[ 3647.748401] sd 1:0:1:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748402] sd 1:0:1:0: [sde] tag#1 CDB: Read(10) 28 00 b4 26 aa 70 00 00 08 00
[ 3647.748403] blk_update_request: I/O error, dev sde, sector 3022432880
[ 3647.748408] sd 1:0:3:0: [sdg] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748409] sd 1:0:3:0: [sdg] tag#2 CDB: Read(10) 28 00 b4 26 ca 78 00 00 08 00
[ 3647.748410] blk_update_request: I/O error, dev sdg, sector 3022441080
[ 3655.074695] sd 1:0:2:0: [sdf] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3655.074699] sd 1:0:2:0: [sdf] tag#8 CDB: Read(10) 28 00 64 9d b8 c0 00 00 08 00
[ 3655.074700] blk_update_request: I/O error, dev sdf, sector 1688058048
[ 3655.074712] sd 1:0:2:0: [sdf] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074713] sd 1:0:2:0: [sdf] tag#10 Sense Key : Not Ready [current] 
[ 3655.074715] sd 1:0:2:0: [sdf] tag#10 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074716] sd 1:0:2:0: [sdf] tag#10 CDB: Read(10) 28 00 64 9d 80 e8 00 00 08 00
[ 3655.074717] blk_update_request: I/O error, dev sdf, sector 1688043752
[ 3655.074721] sd 1:0:2:0: [sdf] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 3655.074722] sd 1:0:2:0: [sdf] tag#13 Sense Key : Not Ready [current] 
[ 3655.074723] sd 1:0:2:0: [sdf] tag#13 Add. Sense: Logical unit not ready, initializing command required
[ 3655.074724] sd 1:0:2:0: [sdf] tag#13 CDB: Read(10) 28 00 64 9d 90 60 00 00 08 00
[ 3655.074725] blk_update_request: I/O error, dev sdf, sector 1688047712

Scrubbing gives no 'fixes', as there is no data corruption or pools that get unhappy, just a few errors. As far as I can see, either ZFS isn't waiting long enough for the disks to spin up (they actually spin up on access), or tries some command before checking the disk is ready for it.

The pool status:

john@clava:~$ sudo zpool status green pool
  pool: green pool
 state: ONLINE
  scan: scrub repaired 0 in 19h28m with 0 errors on Sun May 29 17:09:56 2016
config:

    NAME                                          STATE     READ WRITE CKSUM
    greenpool                                     ONLINE       0     0     0
      mirror-0                                    ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5757832  ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8848843  ONLINE       0     0     0
      mirror-1                                    ONLINE       0     0     0
        ata-WDC_WD20EARX-00PASB0_WD-WCAZA8841762  ONLINE       0     0     0
        ata-WDC_WD20EARS-00MVWB0_WD-WCAZA5736249  ONLINE       0     0     0

errors: No known data errors

I can disable spindown/standby, but not all pools are always in use, some are only archives. I enabled standby timeouts before the IO errors came along, so to me, it sounds like ZFS or ZoL doesn't deal with spindown or standby very well?

Additional data:

Using version 0.6.5.6-2 and with Linux 4.5.0-2-amd64.

@behlendorf
Copy link
Contributor

@johnkeates thanks for filing this. I spent a little time looking at this so let me post a possible solution to resolve this. This problem should only impact devices which Linux is performing explicitly power management on. Drives which spin up/down due to internal power management shouldn't see this issue.

We're going to need to add some code to check the request_queue->rpm_status to determine when a device has been suspended. If the block device was suspended then it needs to be woken before any new bio's are submitted, this can be done by dispatching a normal read request (say for the label). Once the request_queue->rpm_status state has transitioned back to active new bio's can be submitted.

@behlendorf behlendorf added the Bug label May 31, 2016
@behlendorf behlendorf added this to the 0.7.0 milestone May 31, 2016
@antst
Copy link

antst commented Aug 20, 2016

Same issue with SATA disks behind SAS controller.
But it is not only about errors in logs.
With 6 drives it takes quite some time to spin-up all drives, because spin-up is sequential.
Maybe it is also reasonable to issue parallel wake to all devices in the pool on discovery of standby device, rather than discover them in standby and waking one by one.

Just to illustrate, on my home nas with 6 SATA disks behind mpt3sas (IT mode), it takes 9sec to wake them all in parallel (hdparm --read-sector in paralllel, till I get sector from all of them), and 44 sec to wake sequentially as it happens with ZFS now.

9 vs 44 is quite a difference.

@behlendorf behlendorf removed the Bug label Sep 30, 2016
@hoppel118
Copy link

hoppel118 commented Oct 10, 2016

Hey guys,

the same for me, have a look at these issues:

#3785
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504909?comments=all

Here I reported some things about my situation and my environment:

#4638

After hdd standby I also have to wait for a about one minute. I use 8x4tb wd red in a raid-z2 behind a lsi sas3008 (mpt3sas).

Greetings Hoppel118

@inkdot7
Copy link
Contributor

inkdot7 commented Nov 13, 2016

Spinning many drives up in parallel may be very unfriendly towards the power supply. To make matters worse, I think it is not uncommon for drive motors to use the +5 V line, which on most (consumer) PSUs has rather low current ratings. Better RAID cards are usually careful during boot to stagger drive spin-up, often by a some seconds at least.

@rubictus
Copy link

Same issue too with SATA disks behind a LSI Logic SAS2008, running CentOS 7.1, when disks are in stand-by mode.
I always just trying to read some data.

uname -a

Linux prolib 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

modinfo zfs

filename: /lib/modules/3.10.0-514.6.1.el7.x86_64/extra/zfs.ko
version: 0.6.5.9-1
license: CDDL
author: OpenZFS on Linux
description: ZFS
rhelversion: 7.3
srcversion: 42C4AB70887EA26A9970936

dmesg

109797.582453] sd 0:0:4:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[109797.582458] sd 0:0:4:0: [sde] CDB: Write(10) 2a 00 5c 00 2a 60 00 00 08 00
[109797.582460] blk_update_request: I/O error, dev sde, sector 1543514720
[109797.582476] sd 0:0:5:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[109797.582477] sd 0:0:5:0: [sdf] CDB: Write(10) 2a 00 5c 00 2a 60 00 00 08 00
[109797.582478] blk_update_request: I/O error, dev sdf, sector 1543514720
[109807.394117] sd 0:0:2:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[109807.394121] sd 0:0:2:0: [sdc] CDB: Write(10) 2a 00 82 00 2a 60 00 00 08 00
[109807.394123] blk_update_request: I/O error, dev sdc, sector 2181048928
[109807.394140] sd 0:0:3:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[109807.394141] sd 0:0:3:0: [sdd] CDB: Write(10) 2a 00 82 00 2a 60 00 00 08 00
[109807.394142] blk_update_request: I/O error, dev sdd, sector 2181048928

zpool events

Feb 16 2017 21:23:05.232852408 ereport.fs.zfs.io
Feb 16 2017 21:23:05.237852445 ereport.fs.zfs.io
Feb 16 2017 21:23:05.237852445 ereport.fs.zfs.io
Feb 16 2017 21:23:05.250852543 ereport.fs.zfs.io
Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io
Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io

zpool events -v

Feb 16 2017 21:23:05.265852655 ereport.fs.zfs.io
class = "ereport.fs.zfs.io"
ena = 0x3df94b0f49700401
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0xc288a8df9547ced5
vdev = 0x3a52042ed20235f2
(end detector)
pool = "vol0"
pool_guid = 0xc288a8df9547ced5
pool_context = 0x0
pool_failmode = "wait"
vdev_guid = 0x3a52042ed20235f2
vdev_type = "raidz"
vdev_ashift = 0xc
vdev_complete_ts = 0x0
vdev_delta_ts = 0x0
vdev_read_errors = 0x0
vdev_write_errors = 0x0
vdev_cksum_errors = 0x0
parent_guid = 0xc288a8df9547ced5
parent_type = "root"
vdev_spare_paths =
vdev_spare_guids =
zio_err = 0x5
zio_flags = 0x80080
zio_stage = 0x200000
zio_pipeline = 0x210000
zio_delay = 0x0
zio_timestamp = 0x0
zio_delta = 0x0
zio_offset = 0x41000132000
zio_size = 0x200
zio_objset = 0x31
zio_object = 0x0
zio_level = 0xffffffffffffffff
zio_blkid = 0x0
time = 0x58a60a29 0xfd896ef
eid = 0x22

@Jimmy-Z
Copy link

Jimmy-Z commented Mar 12, 2017

I'm having this issue too, so there is no bad consequences despite the terrifying IO error mail?

@behlendorf behlendorf modified the milestones: 0.8.0, 0.7.0 Jun 6, 2017
@rschulman
Copy link

Is there any further news on this issue? I replaced one of the drives in my pool and now have been getting roughly hourly emails from root complaining of an IO error. Anyway to quiet this error down, if it hasn't been fixed yet, but still receive actual errors?

@crimp42
Copy link

crimp42 commented Jul 7, 2017

I am also really curious if someone figured out what is going on with this. I don't even use ZFS and am experiencing the same errors (got to this thread via google) I assume it is kernel/ dependent. I just recently bought several LSI SAS2008 controllers and was running CentOS 7.3 and was seeing these errors in my logs every so often. Finally narrowed it down to it being any SATA drive that was in sleep mode that was being accessed and while it was waking the errors would occur. Switched my (CentOS 7.3) boot drive out and went to CentOS 6.8 and the errors never occur. These particular errors seem to go back years, and it looks like no progress has been made to fix this issue? I did find one thread where people with errors switched the firmware on their LSI SAS2008 controllers back down to P19 and were OK. (I am on 20) I will probably try that next.

@crimp42
Copy link

crimp42 commented Jul 7, 2017

I tried P19 but had the same results, I also tried CentOS 6.9 and do not have the issue. The only difference I can see at this point is Kernel 2.X vs 3.x/4.x, I also tried on CentOS 7.3 to install Kernel 4.x and the problem was still there. So I would assume those of you with this issue with ZFS would NOT see this problem on a 2.x based kernel Linux Distro using the LSI SAS2008 with SATA drives.

Again my issue is that ANY SATA drive connected to the LSI SAS2008 (running P19 or P20) when the drive is asleep, access to the drive through the OS would create /var/log/messages errors similar to those listed above:

[ 3647.748383] sd 1:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[ 3647.748386] sd 1:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 64 9d ac c8 00 00 08 00

I also noticed on some of my drives that spin up a tad quicker, sometimes the error does not get generated.

@crimp42
Copy link

crimp42 commented Jul 8, 2017

One other thing I checked was the version of mpt2sas. (modinfo mpt2sas) On CentOS 6.9 where I am unable to reproduce the issue it is version 20.102.00.00, and on CentOS 7.3, where I am able to reproduce this issue, it is the same version 20.102.00.00. Only difference at this point that I can see is kernel 2.x (where it does not occur) and 3.x/4.x where it occurs.

@johnkeates
Copy link
Author

I don't have any hardware that 2.x runs on, but I have the same with comparable mtp2sas versions. Doesn't seem to matter what version I use.

@crimp42
Copy link

crimp42 commented Jul 8, 2017

John, what HBA(s) model are you using? I assume SATA drives on a SAS/SATA controller?

@johnkeates
Copy link
Author

Mostly SAS2008 controllers, all with SATA drives. I have one with SAS drives, and that one is not giving any trouble.

@crimp42
Copy link

crimp42 commented Jul 8, 2017

John, do you know what firmware level you are running with your HBA's?

And I am not sure if you all know, but if you disable drive spin down, the errors never occur for me on problematic 3.x/4.x kernel, so I assume the same would hold true for ZFS. Not ideal, and a waste of money if you have a large # of disks (reason I went back to 2.x kernel)

Most of the time when people report this issue with SAS2008 based HBA's and SATA drives, everyone assumes cable issues or power supply issues. I wonder if that is why no one seems to get to the bottom of this combination issue.

@johnkeates
Copy link
Author

I solved it in a few ways:

  • Staggered spinup via PUIS + rc.local script
  • 30s random block read with dd on all devices
  • disable standby on disks
  • disable standby on controller

All of them work. But the issue remains: ZFS should not be sad because a disk went in standby.

I'm mostly running this combination:

LSISAS2308: FWVersion(15.00.00.00), ChipRevision(0x05), BiosVersion(07.29.00.00)
and
LSISAS2008: FWVersion(15.00.00.00), ChipRevision(0x03), BiosVersion(07.29.00.00)

@crimp42
Copy link

crimp42 commented Jul 8, 2017

So basically your just keeping your drives from falling asleep, to avoid the errors.

One thing I have seen from numerous threads out there, is that no one has reported issues (any corruption) from this at least on a JBOD level. Just the annoying fact that they are always there if you like to save power/noise/heat. Not sure how it would impact ZFS since it is outside its control. It also seems like it is specific to LSI, I have seen multiple types of HBA's that are all LSI based where people report this.

I increased verbosity on SCSI logging on one box I have the 3.x kernel on where it errors out when the drive is accessed, and another where I have the 2.x kernel, where it works fine. In both cases they are drives on a LSI 9211-8i controller with P19 firmware running in IT mode, the drive was put into standby, then I just refreshed the directory to force it to read from the drive.

BOTH kernels are using the 20.102.00.00 mpt2sas module.

3.x kernel: (3.10.0-514.26.2.el7.x86_64)
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Sense Key : Not Ready [current]
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Add. Sense: Logical unit not ready, initializing command required
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880
Jul 8 15:24:47 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Start/Stop Unit 1b 00 00 00 01 00
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff88015595f880
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Test Unit Ready 00 00 00 00 00 00
Jul 8 15:24:56 misc02 kernel: blk_update_request: I/O error, dev sdm, sector 3696
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f0fc0
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 00 00 0e 70 00 00 00 08 00 00
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1180
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 36 c3 cb 40 00 00 00 10 00 00
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1340
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] CDB: Read(16) 88 00 00 00 00 00 15 e7 68 c0 00 00 00 10 00 00
Jul 8 15:24:56 misc02 kernel: sd 9:0:5:0: [sdm] Send: scmd 0xffff8800d85f1500

2.x kernel: (2.6.32-696.3.2.el6.x86_64)
Jul 8 15:29:51 misc01 kernel: sd 1:0:0:0: [sdh] Send:
Jul 8 15:29:51 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 42 40 61 40 00 00 10 00
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send:
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 52 ac c5 10 00 00 10 00
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send:
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 00 00 ed f0 00 00 10 00
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] Send:
Jul 8 15:30:01 misc01 kernel: sd 1:0:0:0: [sdh] CDB: Read(10): 28 00 00 00 2c 80 00 00 10 00

@hoppel118
Copy link

Hey guys,

I am not sure about the comment and milestone from @behlendorf at 31 May 2016:

Is this issue solved with 0.7.0? When will version 0.7.0 be released for debian jessie?

I am using openmediavault3 (debian jessie) with the following zol version:

root@omv:~# apt-cache depends --important openmediavault-zfs
openmediavault-zfs
 |Hängt ab von: linux-headers-amd64
 |Hängt ab von: pve-headers
  Hängt ab von: linux-headers-3.16.0-4-all
  Hängt ab von: openmediavault
  Hängt ab von: openmediavault-omvextrasorg
  Hängt ab von: zfs-dkms
 |Hängt ab von: zfsutils-linux
  Hängt ab von: <zfsutils>
    zfsutils-linux
  Hängt ab von: zfs-zed
root@omv:~# apt-cache depends --important pve-headers
pve-headers
  Hängt ab von: pve-headers-4.4.59-1-pve
root@omv:~# apt-cache policy pve-headers
pve-headers:
  Installiert:           4.4-87
  Installationskandidat: 4.4-87
  Versionstabelle:
 *** 4.4-87 0
        100 /var/lib/dpkg/status
root@omv:~# apt-cache depends --important zfs-dkms
zfs-dkms
  Hängt ab von: dkms
  Hängt ab von: lsb-release
 |Hängt ab von: debconf
  Hängt ab von: <debconf-2.0>
    cdebconf
    debconf
  Hängt ab von (vorher): spl-dkms
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.9-2~bpo8+1
  Installationskandidat: 0.6.5.9-2~bpo8+1
  Versionstabelle:
 *** 0.6.5.9-2~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache depends --important zfsutils-linux
zfsutils-linux
  Hängt ab von: init-system-helpers
  Hängt ab von: libblkid1
  Hängt ab von: libc6
  Hängt ab von: libnvpair1linux
  Hängt ab von: libuuid1
  Hängt ab von: libuutil1linux
  Hängt ab von: libzfs2linux
  Hängt ab von: libzpool2linux
  Hängt ab von: zlib1g
  Hängt ab von: python3
root@omv:~# apt-cache policy zfsutils-linux
zfsutils-linux:
  Installiert:           0.6.5.9-2~bpo8+1
  Installationskandidat: 0.6.5.9-2~bpo8+1
  Versionstabelle:
 *** 0.6.5.9-2~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~#

Greetings Hoppel

@0xFelix
Copy link

0xFelix commented Aug 5, 2017

Hey,

I'm also still having the same problems like @hoppel118.

@behlendorf: Is there any progress for the solution mentioned in #4713 (comment) ?

The errors in the kernel log and the constant CHECKSUM errors I'm getting are rather frightening.

@crimp42
Copy link

crimp42 commented Sep 2, 2017

Just wanted to leave another note regarding this.

As I said previously this issue does NOT occur with CentOS 6.9 (which uses kernel 2.x) So I assume if you can use ZFS under this environment, this issue would not occur.

I can easily reproduce it on CentOS 7.x with Kernel 3.x and 4.x

This weekend I took one of my CentOS 6.9 servers and swapped the boot drive and installed Ubuntu 16.04.3 LTS which had Kernel version 4.4.0-87-generic

Version of mpt2sas under this verion of Ubuntu was: 12.100.00.00

As soon as my jbod disks go to sleep and I access them, the issue occurs still while they are waking up:

Sep 2 16:14:43 misc02 kernel: [ 713.411876] sd 9:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Sep 2 16:14:43 misc02 kernel: [ 713.411888] sd 9:0:0:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 15 d5 12 80 00 00 00 08 00 00
Sep 2 16:14:43 misc02 kernel: [ 713.411894] blk_update_request: I/O error, dev sdi, sector 366285440

I feel very confident something was introduced at some point in the Kernel 3.x series that is still present in 4.x causing this issue (that also manifests itself with those of you ZFS users)

@behlendorf
Copy link
Contributor

This was added to the 0.8 milestone so we don't loose track of it. @brianmduncan my guess is this was likely introduced in the 3.9 kernel with the CONFIG_PM_RUNTIME option.

@cobrafast
Copy link

Just wanted to chime in I'm affected by this problem too (and am not a ZFS user either).
I made my write-up over at https://serverfault.com/q/876750/81089 since I didn't find this thread earlier.

My kernel (4.13.4-1-ARCH) definitely has both, CONFIG_PM and "[SCSI] Fix 'Device not ready' issue on mpt2sas".

I'm going to try and debug my kernel in the next few days.

@cobrafast
Copy link

Here's the first debug log I was able to produce: https://pastebin.com/Kg0vdCf7
Lines beginning with SCSI DEBUG: are debug prints I added myself in scsi_check_sense(), since that was where someone was suspecting the problem.

@cobrafast
Copy link

Just wanted to let you guys know that I think I've come across a temporary fix until I've figured this out completely.

You need to compile your own kernel, but change ./include/linux/blkdev.h:884, which should look like this:

#define BLK_MIN_SG_TIMEOUT	(7 * HZ)

This constant is the absolute minimum lifetime a SCSI command gets to do its thing and the default setting of 7 seconds is just too short for most hard drives to spin up. I've now set it to (45 * HZ) which should give even md-raid operations that trigger a staggered spin-up of several drives enough time to finish.

The downside of this is that you might get an unresponsive/laggy system in case of a broken drive that cannot complete any commands anymore.

However, since this constant was last changed in git in 2007 (when SGv3 was added for Linux 2.4 and 2.5), I doubt that this is actually the source of this problem. I'm going to keep looking some more.

@crimp42
Copy link

crimp42 commented Oct 13, 2017

Great find!!! If I get a chance this weekend I will restore one of my Linux boxes back to CentOS 7.x and try to compile a custom kernel with the adjustment.

I hope you can find the original source. And this can be reported to be patched in the kernels moving forward. This particular issue has been around for years, and I found it odd that no one has squashed this one. Usually people say it is a hardware issue and leave it at that..

I was starting to believe that I would need to find an alternative to my LSI SAS2008 controllers when I needed to move forward to any distro using kernels above 3.x

@d-helios
Copy link

@red-scorp I asked because I primary use zfs as NAS storage and when bad disks not mark as "bad" quick enough it lead to degradation of virtual machines that places on it. So setting high timeout it's not quit good idea.

and as I wrote perviously, Solaris on the same hardware doesn't have any problem with it.
I'm going to replace on of NAS with Ubuntu version, will see how things going in production ))

@red-scorp
Copy link

@d-helios same usage for me and I use ubuntu 18.04. It doe not drop disks from my array but it gives I/O errors periodically. My solution was to use another controller and forget about the built-in SAS3008 chip.
I've used both mpt3sas.msix_disable=1 mpt3sas.missing_delay=60,60 as a kernel option.

@chinesestunna
Copy link

chinesestunna commented Aug 29, 2018

Quick update from me, trying out different fix, disk APM settings using hdparm -B flag. Around same time of upgrading to 4.x kernel system (OMV4) I also added a startup script to configure the disks such as set NCQ, idle spin down time (60 min) etc on the disks.
Among these commands is disk APM, reading the man pages and guides of using hdparm to set APM settings, I believe I misunderstood as most of the guides state APM values > 127 will not allow spin down. I definitely want my disks to spin down so I've set APM to 127, rather low from a performance standpoint. One anecdotal observation I have is that since upgrade array spin up from sleep seems "slow" and of course gets stuck with disk i/o are stuck.
Anyways I think the low APM setting has an impact on how long it take a disk to spin up and respond to commands. I've tested higher APM settings and if a disk will still go to sleep and now am running at 254 which is in theory highest before APM is turned off. Disks still go to sleep after 60 min of idle which is good but I will report back if the issue continues

@d-helios
Copy link

d-helios commented Aug 29, 2018 via email

@chinesestunna
Copy link

@d-helios I have not tried enabling debug, I'll capture the syslog next time something seems wrong or drive drop for analysis. Generally it's similar to what others have posted here, except I have an expander that gets reset when it seems the disks take too long to wake up

@chinesestunna
Copy link

chinesestunna commented Aug 31, 2018

server VM has been running since Wednesday, so far the logs still show sprinkle of i/o read errors and one expander reset. Seems like same issues as before so it'll be a matter of time before a disk doesn't "respond" quickly enough and drops:
[Wed Aug 29 19:29:02 2018] sd 0:0:0:0: [sda] tag#7925 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Wed Aug 29 19:29:02 2018] sd 0:0:0:0: [sda] tag#7925 CDB: Read(16) 88 00 00 00 00 00 07 48 1c 60 00 00 00 08 00 00 [Wed Aug 29 19:29:02 2018] print_req_error: I/O error, dev sda, sector 122166368 [Thu Aug 30 10:32:01 2018] sd 0:0:0:0: [sda] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:32:01 2018] sd 0:0:0:0: [sda] tag#5293 CDB: Read(16) 88 00 00 00 00 00 07 48 1c 60 00 00 00 08 00 00 [Thu Aug 30 10:32:01 2018] print_req_error: I/O error, dev sda, sector 122166368 [Thu Aug 30 10:58:35 2018] sd 0:0:7:0: [sdh] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:58:35 2018] sd 0:0:7:0: [sdh] tag#5293 CDB: Read(16) 88 00 00 00 00 00 da 5b 6c 20 00 00 00 08 00 00 [Thu Aug 30 10:58:35 2018] print_req_error: I/O error, dev sdh, sector 3663424544 [Thu Aug 30 10:58:44 2018] sd 0:0:8:0: [sdi] tag#5293 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [Thu Aug 30 10:58:44 2018] sd 0:0:8:0: [sdi] tag#5293 CDB: Read(16) 88 00 00 00 00 00 00 00 97 f8 00 00 00 08 00 00 [Thu Aug 30 10:58:44 2018] print_req_error: I/O error, dev sdi, sector 38904 [Fri Aug 31 12:04:25 2018] sd 0:0:11:0: attempting device reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:25 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:25 2018] scsi target0:0:11: handle(0x0015), sas_address(0x5001e677b7fb5ff1), phy(17) [Fri Aug 31 12:04:25 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device reset: FAILED scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: attempting target reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:26 2018] scsi target0:0:11: handle(0x0015), sas_address(0x5001e677b7fb5ff1), phy(17) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device_block, handle(0x0015) [Fri Aug 31 12:04:26 2018] scsi target0:0:11: target reset: SUCCESS scmd(000000002ce2b2ab) [Fri Aug 31 12:04:26 2018] sd 0:0:11:0: device_unblock and setting to running, handle(0x0015) [Fri Aug 31 12:04:27 2018] sd 0:0:11:0: Power-on or device reset occurred [Fri Aug 31 12:04:27 2018] mpt2sas_cm0: attempting host reset! scmd(000000002ce2b2ab) [Fri Aug 31 12:04:27 2018] sd 0:0:11:0: [sdl] tag#2877 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [Fri Aug 31 12:04:37 2018] mpt2sas_cm0: sending diag reset !! [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: diag reset: SUCCESS [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: CurrentHostPageSize is 0: Setting default host page size to 4k [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: LSISAS2308: FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00) [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: Protocol=( [Fri Aug 31 12:04:38 2018] Initiator [Fri Aug 31 12:04:38 2018] ,Target [Fri Aug 31 12:04:38 2018] ), [Fri Aug 31 12:04:38 2018] Capabilities=( [Fri Aug 31 12:04:38 2018] TLR [Fri Aug 31 12:04:38 2018] ,EEDP [Fri Aug 31 12:04:38 2018] ,Snapshot Buffer [Fri Aug 31 12:04:38 2018] ,Diag Trace Buffer [Fri Aug 31 12:04:38 2018] ,Task Set Full [Fri Aug 31 12:04:38 2018] ,NCQ [Fri Aug 31 12:04:38 2018] ) [Fri Aug 31 12:04:38 2018] mpt2sas_cm0: sending port enable !! [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: port enable: SUCCESS [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: start [Fri Aug 31 12:04:46 2018] scsi target0:0:0: handle(0x000a), sas_addr(0x5001e677b7fb5fe4) [Fri Aug 31 12:04:46 2018] scsi target0:0:0: enclosure logical id(0x5001e677b7fb5fff), slot(4) [Fri Aug 31 12:04:46 2018] scsi target0:0:1: handle(0x000b), sas_addr(0x5001e677b7fb5fe5) [Fri Aug 31 12:04:46 2018] scsi target0:0:1: enclosure logical id(0x5001e677b7fb5fff), slot(5) [Fri Aug 31 12:04:46 2018] scsi target0:0:2: handle(0x000c), sas_addr(0x5001e677b7fb5fe6) [Fri Aug 31 12:04:46 2018] scsi target0:0:2: enclosure logical id(0x5001e677b7fb5fff), slot(6) [Fri Aug 31 12:04:46 2018] scsi target0:0:3: handle(0x000d), sas_addr(0x5001e677b7fb5fe7) [Fri Aug 31 12:04:46 2018] scsi target0:0:3: enclosure logical id(0x5001e677b7fb5fff), slot(7) [Fri Aug 31 12:04:46 2018] scsi target0:0:4: handle(0x000e), sas_addr(0x5001e677b7fb5fe8) [Fri Aug 31 12:04:46 2018] scsi target0:0:4: enclosure logical id(0x5001e677b7fb5fff), slot(8) [Fri Aug 31 12:04:46 2018] scsi target0:0:5: handle(0x000f), sas_addr(0x5001e677b7fb5fe9) [Fri Aug 31 12:04:46 2018] scsi target0:0:5: enclosure logical id(0x5001e677b7fb5fff), slot(9) [Fri Aug 31 12:04:46 2018] scsi target0:0:6: handle(0x0010), sas_addr(0x5001e677b7fb5fea) [Fri Aug 31 12:04:46 2018] scsi target0:0:6: enclosure logical id(0x5001e677b7fb5fff), slot(10) [Fri Aug 31 12:04:46 2018] scsi target0:0:7: handle(0x0011), sas_addr(0x5001e677b7fb5feb) [Fri Aug 31 12:04:46 2018] scsi target0:0:7: enclosure logical id(0x5001e677b7fb5fff), slot(11) [Fri Aug 31 12:04:46 2018] scsi target0:0:8: handle(0x0012), sas_addr(0x5001e677b7fb5fee) [Fri Aug 31 12:04:46 2018] scsi target0:0:8: enclosure logical id(0x5001e677b7fb5fff), slot(14) [Fri Aug 31 12:04:46 2018] scsi target0:0:9: handle(0x0013), sas_addr(0x5001e677b7fb5fef) [Fri Aug 31 12:04:46 2018] scsi target0:0:9: enclosure logical id(0x5001e677b7fb5fff), slot(15) [Fri Aug 31 12:04:46 2018] scsi target0:0:10: handle(0x0014), sas_addr(0x5001e677b7fb5ff0) [Fri Aug 31 12:04:46 2018] scsi target0:0:10: enclosure logical id(0x5001e677b7fb5fff), slot(16) [Fri Aug 31 12:04:46 2018] scsi target0:0:11: handle(0x0015), sas_addr(0x5001e677b7fb5ff1) [Fri Aug 31 12:04:46 2018] scsi target0:0:11: enclosure logical id(0x5001e677b7fb5fff), slot(17) [Fri Aug 31 12:04:46 2018] scsi target0:0:12: handle(0x0016), sas_addr(0x5001e677b7fb5ff2) [Fri Aug 31 12:04:46 2018] scsi target0:0:12: enclosure logical id(0x5001e677b7fb5fff), slot(18) [Fri Aug 31 12:04:46 2018] scsi target0:0:13: handle(0x0017), sas_addr(0x5001e677b7fb5ff3) [Fri Aug 31 12:04:46 2018] scsi target0:0:13: enclosure logical id(0x5001e677b7fb5fff), slot(19) [Fri Aug 31 12:04:46 2018] scsi target0:0:14: handle(0x0018), sas_addr(0x5001e677b7fb5ff4) [Fri Aug 31 12:04:46 2018] scsi target0:0:14: enclosure logical id(0x5001e677b7fb5fff), slot(20) [Fri Aug 31 12:04:46 2018] scsi target0:0:15: handle(0x0019), sas_addr(0x5001e677b7fb5ff5) [Fri Aug 31 12:04:46 2018] scsi target0:0:15: enclosure logical id(0x5001e677b7fb5fff), slot(21) [Fri Aug 31 12:04:46 2018] scsi target0:0:16: handle(0x001a), sas_addr(0x5001e677b7fb5ff6) [Fri Aug 31 12:04:46 2018] scsi target0:0:16: enclosure logical id(0x5001e677b7fb5fff), slot(22) [Fri Aug 31 12:04:46 2018] scsi target0:0:17: handle(0x001b), sas_addr(0x5001e677b7fb5ff7) [Fri Aug 31 12:04:46 2018] scsi target0:0:17: enclosure logical id(0x5001e677b7fb5fff), slot(23) [Fri Aug 31 12:04:46 2018] scsi target0:0:18: handle(0x001c), sas_addr(0x5001e677b7fb5ffd) [Fri Aug 31 12:04:46 2018] scsi target0:0:18: enclosure logical id(0x5001e677b7fb5fff), slot(24) [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for end-devices: start [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for PCIe end-devices: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for expanders: start [Fri Aug 31 12:04:46 2018] expander present: handle(0x0009), sas_addr(0x5001e677b7fb5fff) [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: search for expanders: complete [Fri Aug 31 12:04:46 2018] mpt2sas_cm0: host reset: SUCCESS scmd(000000002ce2b2ab) [Fri Aug 31 12:04:56 2018] sd 0:0:11:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:0:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:1:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:2:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:3:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:4:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:5:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:6:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:7:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:8:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:9:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:10:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:12:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:13:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:14:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:15:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:16:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] sd 0:0:17:0: Power-on or device reset occurred [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: end-devices [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: Removing unresponding devices: pcie end-devices [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: expanders [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: removing unresponding devices: complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: expanders start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from expander scan: ioc_status(0x0022), loginfo(0x310f0400) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: expanders complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: end devices start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from end device scan: ioc_status(0x0022), loginfo(0x310f0400) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: end devices complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: pcie end devices start [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: log_info(0x3003011d): originator(IOP), code(0x03), sub_code(0x011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: break from pcie end device scan: ioc_status(0x0022), loginfo(0x3003011d) [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: pcie devices: pcie end devices complete [Fri Aug 31 12:05:09 2018] mpt2sas_cm0: scan devices: complete

@ghost
Copy link

ghost commented Sep 9, 2019

Replicating this in 4.14 with mpt2sas too. Using 0.8.1 (latest revision stable).

@zrav
Copy link

zrav commented Feb 29, 2020

Seeing this in 5.3 with mpt3sas version 29.100, SAS2008 with firmware P20, ZoL 0.8.3.

@kobuki
Copy link

kobuki commented Feb 29, 2020

Interestingly enough, I don't see it any more on 5.2 (Debian 10.1, 5.2.9-2~bpo10+1), mpt3sas 28.100.00.00 (sorry, no idea of the FW version), zfs 0.8.2 (zfs-0.8.2-2~bpo10+1). Any version before 5.2 showed those messages, causing read errors in the pool, kicking disks, etc. With this setup all such errors disappeared and no errors ever in zpool status after the regular scrub.

@cwalv2
Copy link

cwalv2 commented Apr 8, 2020

This issue still seems to happen on random read/write stuff. I got a 9207-8i with P20 firmware, Ubuntu 19.10, 5.3.0-40-generic. I bought the HBA to try to fix an error I was getting with the motherboard onboard SATA: ASRock X570 PRO4

I have a suspicion it's the hard drives somehow. It's very easy to get the error to happen by using either fio or rsyncing a large directory.

I've attached error logs for both the built in SATA as well as the SAS HBA... About to try rebooting with mpt3sas.msix_disable=1 after the thousandth scrub of this pool...

Honestly, I've had pretty terrible experiences with zfs so far. It really doesn't like USB hard drive pools either, I think the second log has a couple instances of one of those crashing as well. (I have a 3-way mirror external HD pool I was using to hold files while transferring data) Maybe that's just the USB drive sucks. It's always the same one that seems to drop out, sooo..

zfs-crash-fio.log
newcrash.log

@zrav
Copy link

zrav commented Jun 14, 2020

FWIW, I've not had the issue anymore with mpt3sas 33.100, kernel >=5.4.0-31 (Ubuntu 20.04), on current ZoL master.

@malventano
Copy link

malventano commented Jun 21, 2020

FWIW, I've not had the issue anymore with mpt3sas 33.100, kernel >=5.4.0-31 (Ubuntu 20.04), on current ZoL master.

I'm reliably reproducing a similar issue. Easy way for me to repro:

  • Put a drive into standby via smartctl -s standby,now (include -d sat if sata device behind scsi).
  • Wait a few seconds for spindown.
  • Wake drive with either a read request or by reading smart data (smartctl -a (or -a)).
  • Note errors in syslog / dmesg appear between 7-9 seconds from the read request / smart status.

Steps taken so far for troubleshooting:

  • mp3sas 33.100.00.00.
  • kernels: 5.4.34-1-pve, 5.4.41-1-pve, and 5.4.44-1-pve.
  • zfs 0.8.4 master.
  • scsi device timeouts confirmed set at 30.
  • mpt3sas.missing_delay=60,60 in commandline has no impact.
  • increased all eh_timeout from 10 (default) to 30 (/sys/block/ as well as /sys/bus/scsi/devices/).
  • Errors occur with an LSI 9207-8e as well as an LSI 9205-8e.
  • Flashed the BIOS image to the 9207, entered the option ROM, and changed all timeouts from 10 to 30 secs.
  • Tested across different enclosure types (Netapp 4486 and Netapp 4246) as well as testing two different expander types in the 4246's (Netapp IOM6 and Xyratex).
  • Tested with sets of cables.
  • Drives used for testing: HGST He12, WD Red 10TB, 8TB, WD Gold 8TB, WD Red 6TB (PMR).

Errors are consistently repeatable across all above configurations and occur ~7-9 seconds after a request is sent to a sleeping device. This happens regardless of any of the above changes being made. Here's an example of the errors received whenever a read request hits a sleeping drive:

[  244.706514] sd 4:0:47:2: [sdbo] tag#8755 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  244.706982] sd 4:0:47:2: [sdbo] tag#8755 Sense Key : Aborted Command [current] [descriptor]
[  244.707446] sd 4:0:47:2: [sdbo] tag#8755 <<vendor>>ASC=0x98 ASCQ=0x6
[  244.707909] sd 4:0:47:2: [sdbo] tag#8755 CDB: Read(16) 88 00 00 00 00 03 5b 7f 91 20 00 00 00 08 00 00
[  244.708372] blk_update_request: I/O error, dev sdbo, sector 14419988768 op 0x0:(READ) flags 0x4700 phys_seg 1 prio class 0

Multipath should not be the cause here as the errors occur even without multipath. If multipath is in use, it will immediately drop the associated link when either error occurs. All relevant multipath timeouts (dev_loss_tmo , checker_timeout) are set to 30. This happens regardless of path_selector mode. Sometimes no paths drop, sometimes one, sometimes both. Below is an example of multipath in use and both paths dropping when a data request attempts to wake a drive:

kernel: [127941.031414] sd 0:0:29:2: [sdah] tag#7255 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.031434] sd 0:0:50:2: [sdbu] tag#3092 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.032084] sd 0:0:29:2: [sdah] tag#7255 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.032713] sd 0:0:50:2: [sdbu] tag#3092 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.033333] sd 0:0:29:2: [sdah] tag#7255 <<vendor>>ASC=0x98 ASCQ=0x1 
kernel: [127941.033947] sd 0:0:50:2: [sdbu] tag#3092 <<vendor>>ASC=0x98 ASCQ=0x1 
kernel: [127941.034538] sd 0:0:29:2: [sdah] tag#7255 CDB: Read(16) 88 00 00 00 00 02 45 e7 38 d8 00 00 06 f8 00 00
kernel: [127941.035095] sd 0:0:50:2: [sdbu] tag#3092 CDB: Read(16) 88 00 00 00 00 02 45 e7 32 c0 00 00 00 98 00 00
Jun 21 00:08:26 BB-8 kernel: [127941.035655] blk_update_request: I/O error, dev sdah, sector 9762715864 op 0x0:(READ) flags 0x4700 phys_seg 17 prio class 0
Jun 21 00:08:26 BB-8 kernel: [127941.036201] blk_update_request: I/O error, dev sdbu, sector 9762714304 op 0x0:(READ) flags 0x4700 phys_seg 3 prio class 0
multipathd[3995]: sdbu: mark as failed
multipathd[3995]: h10: remaining active paths: 1
kernel: [127941.037334] device-mapper: multipath: Failing path 68:128.
kernel: [127941.037973] sd 0:0:29:2: [sdah] tag#2977 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: [127941.038516] sd 0:0:29:2: [sdah] tag#2977 Sense Key : Aborted Command [current] [descriptor] 
kernel: [127941.039046] sd 0:0:29:2: [sdah] tag#2977 <<vendor>>ASC=0x98 ASCQ=0x6 
kernel: [127941.039571] sd 0:0:29:2: [sdah] tag#2977 CDB: Read(16) 88 00 00 00 00 02 45 e7 32 c0 00 00 00 98 00 00
kernel: [127941.040083] blk_update_request: I/O error, dev sdah, sector 9762714304 op 0x0:(READ) flags 0x4700 phys_seg 3 prio class 0
kernel: [127941.040597] device-mapper: multipath: Failing path 66:16.
multipath: dm-40: no usable paths found
multipathd[3995]: sdah: mark as failed
multipathd[3995]: h10: Entering recovery mode: max_retries=10
multipathd[3995]: h10: remaining active paths: 0
multipathd[3995]: h10: sdbu - tur checker reports path is up
multipathd[3995]: 68:128: reinstated
multipathd[3995]: h10: queue_if_no_path enabled
multipathd[3995]: h10: Recovered to normal mode
multipathd[3995]: h10: remaining active paths: 1
kernel: [127945.199129] device-mapper: multipath: Reinstating path 68:128.
pvestatd[30819]: zfs error: cannot open 'rpool/data': dataset does not exist
multipathd[3995]: h10: sdah - tur checker reports path is up
multipathd[3995]: 66:16: reinstated
multipathd[3995]: h10: remaining active paths: 2
kernel: [127951.204478] device-mapper: multipath: Reinstating path 66:16.

The 4246's also throw a sd x:x:x:x: Power-on or device reset occurred in addition to the above. Here is an example where the error was triggered by a smartctl -x request (no sector read errors - only reading smart data):

kernel: [  133.880657] sd 3:0:3:0: Power-on or device reset occurred
kernel: [  133.880798] sd 3:0:54:0: Power-on or device reset occurred
kernel: [  134.421967] device-mapper: multipath: Failing path 67:48.
kernel: [  134.422174] device-mapper: multipath: Failing path 8:48.
multipathd[2569]: mpathi: sdaz - tur checker reports path is down
multipathd[2569]: checker failed path 67:48 in map mpathi
multipathd[2569]: mpathi: remaining active paths: 1
multipathd[2569]: mpathi: sdd - tur checker reports path is down
multipathd[2569]: checker failed path 8:48 in map mpathi
multipathd[2569]: mpathi: Entering recovery mode: max_retries=2
multipathd[2569]: mpathi: remaining active paths: 0
multipath: dm-35: no usable paths found
multipathd[2569]: 67:48: reinstated
multipathd[2569]: mpathi: sdd - tur checker reports path is up
multipathd[2569]: 8:48: reinstated
multipathd[2569]: mpathi: queue_if_no_path enabled
multipathd[2569]: mpathi: Recovered to normal mode
multipathd[2569]: mpathi: remaining active paths: 1
kernel: [  136.416937] device-mapper: multipath: Reinstating path 67:48.
kernel: [  136.417380] device-mapper: multipath: Reinstating path 8:48.

(note: I had changed polling_interval to 2 for this example. I was 10 for the earlier examples. This is why the path came back after 2 / 10 secs respectively).

So far I'm stumped on this one...

@richardelling
Copy link
Contributor

Your drive is aborting the command and returning ASC/ASCQ 0x98/0x6 or 0x98/0x1. These do not seem to be registered with T10 at https://www.t10.org/lists/asc-num.htm
Contact the drive vendor for more information.

@malventano
Copy link

That may be the Netapp 4486 enclosure/sleds issuing those aborts. It has dual drive sleds which have a pair of SATA (He12) behind a Marvell 88SF9210, which bridges to SAS multipath. The other enclosure I tried was a Netapp 4246, which shouldn't have as much 'in the way', but I didn't do read tests to trigger the errors in that config (only experienced the resets at the same 7-8 seconds after requests, in that case smartctl -x). I'll try to trigger some read timeouts and see what happens there.

@hoppel118
Copy link

hoppel118 commented Jul 25, 2020

I am still on Debian Stretch. I have to update my os to buster and check if the problem is solved with the latest versions. That will take „some“ time. ;)

Regards Hoppel

@QBANIN
Copy link

QBANIN commented Aug 3, 2020

I am still on Debian Stretch. I have to update my os to buster and check if the problem is solved with the latest versions. That will take „some“ time. ;)

Regards Hoppel

It's not solved. Just happened to me last night.
Debian Buster, kernel 5.4.41, LSI 9207-8i

@hoppel118
Copy link

Hm.... Bummer...

@bsdice
Copy link

bsdice commented Sep 16, 2020

(This is a copy&paste from my comment in issue #4638 just in case someone finds this issue through a search engine, looking for a workaround)

I got hit by this problem as well, running ZFS 0.8.4 on Linux 5.4 (Arch Linux LTS-kernel) with eight 14 TB SATA disks in RAIDZ2 behind an LSI 2308 controller flashed to IT mode. Whenever I turn on hd-idle and let it spin down the disks (they sit idle 20h per day), ZFS will complain loudly in kernel log during wakeup. After a couple of days of testing many read and, most worringly, also write and even checksum errors occurred (zpool status). Scrub could correct all problems, but this needed to be fixed asap.

I solved the problem by doing away with the LSI and buying a JMicron JMB585 5-port SATA controller card instead. These chips exist since about 2018 so relatively new. No extra driver is needed, the card will run with any even remotely recent stock AHCI driver. Since the switch no more errors have occurred at all, even though I aggressively put disks into standby when not in use. As far as I can see the card also has no PCIe bottleneck, because it can use PCIe 3.0 with two lanes, supposedly reaching 1700 MByte/s transfer rates. Should be good enough for 5 modern HDDs. There are mostly chinese no-names out there, US$ 30-40 in 2020, I recommend getting a card with largish black heatsink though, to preclude thermal issues. There appear to be no electrolytic capacitors on these cards, so these might even be very long term stable (10+years).

@RichieB2B
Copy link

RichieB2B commented Jan 30, 2021

I'm having the same problem with the onboard SAS3008 of my Supermicro X11SSL-CF motherboard. I'm on Debian Buster with zfs 0.8.6-1~bpo10+1 from buster-backports. One thing I plan to do is flash the SAS3008 to P16-V17 firmware which is the latest posted by Supermicro.

# sas3ircu 0 DISPLAY
Avago Technologies SAS3 IR Configuration Utility.
Version 17.00.00.00 (2018.04.02) 
Copyright (c) 2009-2018 Avago Technologies. All rights reserved. 

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS3008
  BIOS version                            : 8.29.01.00
  Firmware version                        : 12.00.02.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 0
  Maximum physical devices                : 255
  Concurrent commands supported           : 3072
  Slot                                    : 2
  Segment                                 : 0
  Bus                                     : 2
  Device                                  : 0
  Function                                : 0
  RAID Support                            : No
------------------------------------------------------------------------
# modinfo mpt3sas
filename:       /lib/modules/4.19.0-13-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias:          mpt2sas
version:        26.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 Device Driver

@kobuki
Copy link

kobuki commented Jan 30, 2021

@RichieB2B the problems disappeared for me on 5.8 or newer kernels from backports. I suggest giving them a try.

@RichieB2B
Copy link

Thanks @kobuki I upgraded the firmware to P16 and Linux kernel to 5.9.0-0.bpo.5-amd64 from buster-backports and I have not seen the errors since.

# sas3ircu 0 DISPLAY
Avago Technologies SAS3 IR Configuration Utility.
Version 17.00.00.00 (2018.04.02) 
Copyright (c) 2009-2018 Avago Technologies. All rights reserved. 

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS3008
  BIOS version                            : 8.37.00.00
  Firmware version                        : 16.00.10.00
  Channel description                     : 1 Serial Attached SCSI

# modinfo mpt3sas
filename:       /lib/modules/5.9.0-0.bpo.5-amd64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko
alias:          mpt2sas
version:        34.100.00.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 Device Driver

@crimp42
Copy link

crimp42 commented Mar 15, 2021 via email

@jonathan-molyneux
Copy link

Had good results running 5.11.8, so far much better than 5.4.86.
Time will tell, but it's through a full scrub without a single read error.

Running P19 on the LSI SAS 9201-16i & SAS 9211-8i controllers.
If you have the same controllers do not run P20, it results in read & write errors.

Thanks for the updates @kobuki @RichieB2B and @brianmduncan.

@red-scorp
Copy link

@jonathan-molyneux : Scrubbing was not a problem. Waking up of derives was always an issue. Let your drives sleep and see whether they waking up good or not.

@xes
Copy link

xes commented May 24, 2021

Maybe not related with your problems ....but if you are using Supermicro servers + LSI 3008 + sas3ircu, please check your backplane (BPN-SAS3-216EL1) firmware:
https://www.supermicro.com/support/faqs/faq.cfm?faq=33592

@nerozero
Copy link

I have the same issue on FreeBSD for years. Fixable most times by disabling hard drive APM (advanced power management) and/or EPC (extended power conditions) options.
For some reasons some of my SSD doesn't seems to have EPC capabilities so they falling off 2-5 times a day.

A really good How-to disable/modify drive power management can be found here: https://serverfault.com/questions/1047331/how-do-i-disable-hard-disk-spin-down-or-head-parking-in-freebsd

But still. disabling apm/epc doesn't count as valid solution.

@crsleeth
Copy link

crsleeth commented Dec 26, 2021

I'm seeing this issue on RHEL 8.5 4.18.0-348.7.1.el8_5 with LSI 9300-8e HBAs and SuperMicro 846 shelves. Heavy IO to the drives will slowly increase the read/write errors in zpool status until ZFS steps in and starts resilvering. For the amount of drives I have in RAIDZ2 resilvering daily isn't really feasible because it takes longer than a day to resilver.

I need to update my 9300-8e's (SAS 3008) firmware but haven't been able to yet.

@Numline1
Copy link

Installing linux-generic-hwe-20.04 (upgrades the kernel from 5.4.x to 5.11.x) on my Ubuntu 20.04.3 fixed the issue (I think, haven't seen an error after resilver and scrub). I still have no idea what exactly changed in these versions that caused this to magically fix itself. I was actually getting errors in zpool status & had a degraded array, always with the same physical disk, dmesg output was similar to what's been posted earlier with something along the lines of "waiting for a device/timeout".

I've tried swapping the SAS2SATA cable (but kept the position of the drive on the same cable #, which might be why). And I'm also using an LSI card, which might've been doing something funky with the driver in older kernels.

Either way, the problem is hopefully solved, I'm just curious as to why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests