My one zpool has experienced two successive drive failures. As I was resilvering the first, the second failed and I got two errors, in snapshots. The resilvering finished, and then I used "zpool replace" to resilver the second faulty drive.
The pool is mounted, all data safe and available except for the two files:
pool: gggpool
state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption.
Applications may be affected.
scan: resilvered 2,35T in 19h29m with 5 errors on Sat Sep 21 03:08:24 2013
config:
NAME STATE READ WRITE CKSUM
gggpool DEGRADED 0 0 5
raidz1-0 DEGRADED 0 0 10
scsi-SATA_ST3000DM001-9YN_Z1F0NJKS ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F0RPKE ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F0RPZG ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F0RQJ2 ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F0RQSV ONLINE 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F0T6VN ONLINE 0 0 0
spare-6 DEGRADED 0 0 0
scsi-SATA_WDC_WD30EZRX-00_WD-WMC1T4095404 UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F118BA ONLINE 0 0 0
replacing-7 UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-1CH_Z1F2Z9VC UNAVAIL 0 0 0
scsi-SATA_ST3000DM001-1CH_Z1F2Z8SM ONLINE 0 0 0
spares
scsi-SATA_ST3000DM001-9YN_Z1F118BA INUSE currently in use
The remaining errors probably point to where the faulty files were - I destroyed the relevant snapshots but these error indications remain:
errors: Permanent errors have been detected in the following files:
<0x218>:<0x7308>
<0x3a0>:<0x295a6b>
I am not worried about these errors. I am trying to detach the two failed drives, both of which has been replaced, but zpool doesn't do it:
root@ggg:~# zpool detach gggpool scsi-SATA_ST3000DM001-1CH_Z1F2Z9VC
cannot detach scsi-SATA_ST3000DM001-1CH_Z1F2Z9VC: no valid replicas
root@ggg:~# zpool detach gggpool scsi-SATA_WDC_WD30EZRX-00_WD-WMC1T4095404
cannot detach scsi-SATA_WDC_WD30EZRX-00_WD-WMC1T4095404: no valid replicas
The two drives have been physically removed from the array - sent in for warranty replacement - but they live on in the zpool configuration. How do I get rid of them?
When reading data from the pool, I can see the "replacing-7" vdev is not active:
capacity operations bandwidth
pool alloc free read write read write
----------------------------------------------- ----- ----- ----- ----- ----- -----
gggpool 19,8T 1,96T 323 0 36,8M 0
raidz1 19,8T 1,96T 323 0 36,8M 0
scsi-SATA_ST3000DM001-9YN_Z1F0NJKS - - 177 0 5,42M 0
scsi-SATA_ST3000DM001-9YN_Z1F0RPKE - - 184 0 5,26M 0
scsi-SATA_ST3000DM001-9YN_Z1F0RPZG - - 183 0 5,55M 0
scsi-SATA_ST3000DM001-9YN_Z1F0RQJ2 - - 183 0 5,25M 0
scsi-SATA_ST3000DM001-9YN_Z1F0RQSV - - 180 0 5,39M 0
scsi-SATA_ST3000DM001-9YN_Z1F0T6VN - - 181 0 5,21M 0
spare - - 298 0 5,47M 0
scsi-SATA_WDC_WD30EZRX-00_WD-WMC1T4095404 - - 0 0 0 0
scsi-SATA_ST3000DM001-9YN_Z1F118BA - - 230 0 5,49M 0
replacing - - 0 0 0 0
scsi-SATA_ST3000DM001-1CH_Z1F2Z9VC - - 0 0 0 0
scsi-SATA_ST3000DM001-1CH_Z1F2Z8SM - - 0 0 0 0
----------------------------------------------- ----- ----- ----- ----- ----- -----
This is worrying because without this VDEV working, the pool has no redundancy - yet I cannot remove or detach any of its two drives. I am in the process of making a full backup - only a day to go. However, destroying this pool and rebuilding it will cause a LOT of headaches, with many filesystems and smb and afs shared having to be re-set up.
And ideas how I can force this failed replacing-7 vdev to work again?
Bookmarks