Hi.
I have 12 disks in a RAID5 array with mdadm.
Disks in raid: sd(bcdefghijklm)
Array: md0
Disk sdi failed and needed to get replaced.
I replaced with a new disk and resynced array.
While resync i lost a nother disk: sde.
Here is the output of cat /proc/mdstat:
Code:
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdc1[0] sdi1[14](S) sdj1[13] sdl1[12] sdg1[9] sde1[8](F) sdm1[7] sdk1[6] sdd1[11] sdf1[3] sdb1[2] sdh1[1]
32232904704 blocks super 1.2 level 5, 512k chunk, algorithm 2 [12/10] [UUUUUUU_U_UU]
unused devices: <none>
Here is the output of mdadm -D /dev/md0:
Code:
# mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Thu Jan 24 20:36:48 2013
Raid Level : raid5
Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
Used Dev Size : 2930264064 (2794.52 GiB 3000.59 GB)
Raid Devices : 12
Total Devices : 12
Persistence : Superblock is persistent
Update Time : Thu Jul 19 05:16:36 2018
State : clean, FAILED
Active Devices : 10
Working Devices : 11
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Name : ingfil:0 (local to host ingfil)
UUID : 9c5baecf:58212783:fe438251:3b70e113
Events : 870676
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 113 1 active sync /dev/sdh1
2 8 17 2 active sync /dev/sdb1
3 8 81 3 active sync /dev/sdf1
11 8 49 4 active sync /dev/sdd1
6 8 161 5 active sync /dev/sdk1
7 8 193 6 active sync /dev/sdm1
7 0 0 7 removed
9 8 97 8 active sync /dev/sdg1
9 0 0 9 removed
12 8 177 10 active sync /dev/sdl1
13 8 145 11 active sync /dev/sdj1
8 8 65 - faulty spare /dev/sde1
14 8 129 - spare /dev/sdi1
Output of smartctl -x /dev/sde:
Code:
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red (AF)
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WMC4N1007575
LU WWN Device Id: 5 0014ee 6594ee571
Firmware Version: 80.00A80
User Capacity: 3*000*592*982*016 bytes [3,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jul 19 17:23:32 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (40320) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 404) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 7
3 Spin_Up_Time POS--K 183 170 021 - 5841
4 Start_Stop_Count -O--CK 100 100 000 - 344
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 051 051 000 - 36167
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 185
192 Power-Off_Retract_Count -O--CK 200 200 000 - 103
193 Load_Cycle_Count -O--CK 001 001 000 - 962848
194 Temperature_Celsius -O---K 118 097 000 - 32
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 4
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 4 [3] occurred at disk power-on lifetime: 36155 hours (1506 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4e 8f fb 28 40 00 Error: UNC at LBA = 0x14e8ffb28 = 5613026088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 01 40 00 80 00 01 4e 90 05 e8 40 08 1d+05:58:34.051 READ FPDMA QUEUED
60 04 00 00 78 00 01 4e 90 01 e8 40 08 1d+05:58:34.051 READ FPDMA QUEUED
60 00 40 00 70 00 01 4e 8f fd a8 40 08 1d+05:58:34.010 READ FPDMA QUEUED
60 00 80 00 68 00 01 4e 8f fd 28 40 08 1d+05:58:34.010 READ FPDMA QUEUED
60 00 80 00 60 00 01 4e 8f fc a8 40 08 1d+05:58:34.010 READ FPDMA QUEUED
Error 3 [2] occurred at disk power-on lifetime: 36155 hours (1506 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 fe 00 01 4e 8f fb 28 40 00 Error: UNC at LBA = 0x14e8ffb28 = 5613026088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 30 00 01 4e 8f f9 e8 40 08 1d+05:58:30.229 READ FPDMA QUEUED
60 04 00 00 28 00 01 4e 8f f5 e8 40 08 1d+05:58:30.221 READ FPDMA QUEUED
60 04 00 00 20 00 01 4e 8f f1 e8 40 08 1d+05:58:30.213 READ FPDMA QUEUED
60 04 00 00 18 00 01 4e 8f ed e8 40 08 1d+05:58:30.209 READ FPDMA QUEUED
60 04 00 00 10 00 01 4e 8f e9 e8 40 08 1d+05:58:30.201 READ FPDMA QUEUED
Error 2 [1] occurred at disk power-on lifetime: 36132 hours (1505 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 4e 8f fb 28 40 00 Error: UNC at LBA = 0x14e8ffb28 = 5613026088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 a8 00 01 4e 94 db 78 40 08 07:33:08.137 READ FPDMA QUEUED
60 04 00 00 b0 00 01 4e 94 d7 78 40 08 07:33:08.132 READ FPDMA QUEUED
60 04 00 00 c0 00 01 4e 94 d3 78 40 08 07:33:08.125 READ FPDMA QUEUED
60 04 00 00 c8 00 01 4e 94 cf 78 40 08 07:33:08.120 READ FPDMA QUEUED
60 04 00 00 d0 00 01 4e 94 cb 78 40 08 07:33:08.113 READ FPDMA QUEUED
Error 1 [0] occurred at disk power-on lifetime: 36132 hours (1505 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 01 00 01 4e 8f fb 28 40 00 Error: UNC at LBA = 0x14e8ffb28 = 5613026088
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 04 00 00 88 00 01 4e 90 17 78 40 08 07:33:01.999 READ FPDMA QUEUED
60 04 00 00 80 00 01 4e 90 13 78 40 08 07:33:01.999 READ FPDMA QUEUED
60 04 00 00 78 00 01 4e 90 0f 78 40 08 07:33:01.999 READ FPDMA QUEUED
60 04 00 00 70 00 01 4e 90 0b 78 40 08 07:33:01.999 READ FPDMA QUEUED
60 04 00 00 68 00 01 4e 90 07 78 40 08 07:33:01.999 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 32 Celsius
Power Cycle Min/Max Temperature: 30/38 Celsius
Lifetime Min/Max Temperature: -2/54 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (367)
Index Estimated Time Temperature Celsius
368 2018-07-19 09:26 32 *************
... ..(476 skipped). .. *************
367 2018-07-19 17:23 32 *************
SCT Error Recovery Control:
Read: 70 (7,0 seconds)
Write: 70 (7,0 seconds)
Device Statistics (GP Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 4 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 151491 Vendor specific
Is it possible to save the array?
Bookmarks