mdadm RAID5 array 2 drives removed

**vwlinux** · July 19th, 2018

Hi.

I have 12 disks in a RAID5 array with mdadm.
Disks in raid: sd(bcdefghijklm)
Array: md0

Disk sdi failed and needed to get replaced.
I replaced with a new disk and resynced array.
While resync i lost a nother disk: sde.

Here is the output of cat /proc/mdstat:

Code:

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdc1[0] sdi1[14](S) sdj1[13] sdl1[12] sdg1[9] sde1[8](F) sdm1[7] sdk1[6] sdd1[11] sdf1[3] sdb1[2] sdh1[1]
      32232904704 blocks super 1.2 level 5, 512k chunk, algorithm 2 [12/10] [UUUUUUU_U_UU]

unused devices: <none>

Here is the output of mdadm -D /dev/md0:

Code:

# mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
  Used Dev Size : 2930264064 (2794.52 GiB 3000.59 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Thu Jul 19 05:16:36 2018
          State : clean, FAILED
 Active Devices : 10
Working Devices : 11
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ingfil:0  (local to host ingfil)
           UUID : 9c5baecf:58212783:fe438251:3b70e113
         Events : 870676

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8      113        1      active sync   /dev/sdh1
       2       8       17        2      active sync   /dev/sdb1
       3       8       81        3      active sync   /dev/sdf1
      11       8       49        4      active sync   /dev/sdd1
       6       8      161        5      active sync   /dev/sdk1
       7       8      193        6      active sync   /dev/sdm1
       7       0        0        7      removed
       9       8       97        8      active sync   /dev/sdg1
       9       0        0        9      removed
      12       8      177       10      active sync   /dev/sdl1
      13       8      145       11      active sync   /dev/sdj1

       8       8       65        -      faulty spare   /dev/sde1
      14       8      129        -      spare   /dev/sdi1

Output of smartctl -x /dev/sde:

Code:

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N1007575
LU WWN Device Id: 5 0014ee 6594ee571
Firmware Version: 80.00A80
User Capacity:    3*000*592*982*016 bytes [3,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul 19 17:23:32 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (40320) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 404) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x703d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    7
  3 Spin_Up_Time            POS--K   183   170   021    -    5841
  4 Start_Stop_Count        -O--CK   100   100   000    -    344
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   051   051   000    -    36167
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    185
192 Power-Off_Retract_Count -O--CK   200   200   000    -    103
193 Load_Cycle_Count        -O--CK   001   001   000    -    962848
194 Temperature_Celsius     -O---K   118   097   000    -    32
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 4
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4 [3] occurred at disk power-on lifetime: 36155 hours (1506 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 4e 8f fb 28 40 00  Error: UNC at LBA = 0x14e8ffb28 = 5613026088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 40 00 80 00 01 4e 90 05 e8 40 08  1d+05:58:34.051  READ FPDMA QUEUED
  60 04 00 00 78 00 01 4e 90 01 e8 40 08  1d+05:58:34.051  READ FPDMA QUEUED
  60 00 40 00 70 00 01 4e 8f fd a8 40 08  1d+05:58:34.010  READ FPDMA QUEUED
  60 00 80 00 68 00 01 4e 8f fd 28 40 08  1d+05:58:34.010  READ FPDMA QUEUED
  60 00 80 00 60 00 01 4e 8f fc a8 40 08  1d+05:58:34.010  READ FPDMA QUEUED

Error 3 [2] occurred at disk power-on lifetime: 36155 hours (1506 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 fe 00 01 4e 8f fb 28 40 00  Error: UNC at LBA = 0x14e8ffb28 = 5613026088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 30 00 01 4e 8f f9 e8 40 08  1d+05:58:30.229  READ FPDMA QUEUED
  60 04 00 00 28 00 01 4e 8f f5 e8 40 08  1d+05:58:30.221  READ FPDMA QUEUED
  60 04 00 00 20 00 01 4e 8f f1 e8 40 08  1d+05:58:30.213  READ FPDMA QUEUED
  60 04 00 00 18 00 01 4e 8f ed e8 40 08  1d+05:58:30.209  READ FPDMA QUEUED
  60 04 00 00 10 00 01 4e 8f e9 e8 40 08  1d+05:58:30.201  READ FPDMA QUEUED

Error 2 [1] occurred at disk power-on lifetime: 36132 hours (1505 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 4e 8f fb 28 40 00  Error: UNC at LBA = 0x14e8ffb28 = 5613026088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 a8 00 01 4e 94 db 78 40 08     07:33:08.137  READ FPDMA QUEUED
  60 04 00 00 b0 00 01 4e 94 d7 78 40 08     07:33:08.132  READ FPDMA QUEUED
  60 04 00 00 c0 00 01 4e 94 d3 78 40 08     07:33:08.125  READ FPDMA QUEUED
  60 04 00 00 c8 00 01 4e 94 cf 78 40 08     07:33:08.120  READ FPDMA QUEUED
  60 04 00 00 d0 00 01 4e 94 cb 78 40 08     07:33:08.113  READ FPDMA QUEUED

Error 1 [0] occurred at disk power-on lifetime: 36132 hours (1505 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 01 4e 8f fb 28 40 00  Error: UNC at LBA = 0x14e8ffb28 = 5613026088

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 04 00 00 88 00 01 4e 90 17 78 40 08     07:33:01.999  READ FPDMA QUEUED
  60 04 00 00 80 00 01 4e 90 13 78 40 08     07:33:01.999  READ FPDMA QUEUED
  60 04 00 00 78 00 01 4e 90 0f 78 40 08     07:33:01.999  READ FPDMA QUEUED
  60 04 00 00 70 00 01 4e 90 0b 78 40 08     07:33:01.999  READ FPDMA QUEUED
  60 04 00 00 68 00 01 4e 90 07 78 40 08     07:33:01.999  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    32 Celsius
Power Cycle Min/Max Temperature:     30/38 Celsius
Lifetime    Min/Max Temperature:     -2/54 Celsius
Under/Over Temperature Limit Count:   0/0
SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (367)

Index    Estimated Time   Temperature Celsius
 368    2018-07-19 09:26    32  *************
 ...    ..(476 skipped).    ..  *************
 367    2018-07-19 17:23    32  *************

SCT Error Recovery Control:
           Read:     70 (7,0 seconds)
          Write:     70 (7,0 seconds)

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            3  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4       151491  Vendor specific

Is it possible to save the array?

**darkod** · July 20th, 2018

First, 12 disks is way too many for raid5. You need at least 2 disk redundancy. Even more if possible...

You should be able to bring the array online. First take a deeper look into each member superblock info:

Code:

sudo mdadm -E /dev/sd[bcdefghijklm]1

That will show you Event counters which are important to figure out how up to date each member is. After you post that output we can advise more...

**vwlinux** · July 20th, 2018

I know this.
Was going to move all data to new server.
Thank you for your reply.

Here is the output of the mdadm -E /dev/sd[bcdefghijklm]1

Code:

# mdadm -E /dev/sd[bcdefghijklm]1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : bd70c6e3:6f0536d1:3cdc907d:cba8a589

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : 1c74c438 - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 571efc24:34a91143:075d21f8:ca040d3b

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : fb7df25d - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f75ff05b:e15fbf7c:d712c3c0:aedf87d8

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : d23c7b6a - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : ff5db992:40dc1d0a:46c8b0be:df02baa5

    Update Time : Thu Jul 19 05:15:33 2018
       Checksum : 6181ce3e - correct
         Events : 870672

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 7
   Array State : AAAAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6599e8ed:dddecd34:76a41633:fb5fbb58

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : eca45b8 - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b050efc5:c7e92c1b:b700de78:043bafb7

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : 71eb3f3d - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 8
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : fd99f2f3:a9e46376:3ddfcc1b:f9baff23

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : a64e1df - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdi1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860529005 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 4096 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a8da1419:de490f38:6e2519d8:35887114

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : 9df0a6a5 - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdj1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860529005 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
  Used Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
    Data Offset : 4096 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f4c9feba:470c018d:99a83fcc:5fc18c66

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : db0e14af - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 11
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdk1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 1c15eb5c:751ded1b:e5837644:4393b5af

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : cd4612c0 - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdl1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 15070871:74c810a5:5491dfaa:3794a736

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : 57e1be22 - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 10
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)
/dev/sdm1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9c5baecf:58212783:fe438251:3b70e113
           Name : ingfil:0  (local to host ingfil)
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
   Raid Devices : 12

 Avail Dev Size : 5860528128 (2794.52 GiB 3000.59 GB)
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : a75f6f2a:bc3db33a:40bd7b6b:b1e75168

    Update Time : Thu Jul 19 05:16:36 2018
       Checksum : 99320b5c - correct
         Events : 870676

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : AAAAAAA.A.AA ('A' == active, '.' == missing)

**darkod** · July 20th, 2018

Hmmm, this should be easy... All of your members (including sdi1) report 870676 events counter, except sde1 which says 870672. The only difference with sdi1 seems to be that it is marked as spare right now, out of what ever reason. But if its counter is really 870676, it should fit into the array just nicely.

First stop the array and try to force assemble it leaving sde1 out for the moment:

Code:

sudo mdadm --stop /dev/md0
sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcdfghijklm]1

Follow the messages it is giving you during the assemble and lets see how that goes...

**vwlinux** · July 21st, 2018

Here is the output of the command:

Code:

# mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcdfghijklm]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot -1.
mdadm: /dev/sdj1 is identified as a member of /dev/md0, slot 11.
mdadm: /dev/sdk1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdl1 is identified as a member of /dev/md0, slot 10.
mdadm: /dev/sdm1 is identified as a member of /dev/md0, slot 6.
mdadm: added /dev/sdh1 to /dev/md0 as 1
mdadm: added /dev/sdb1 to /dev/md0 as 2
mdadm: added /dev/sdf1 to /dev/md0 as 3
mdadm: added /dev/sdd1 to /dev/md0 as 4
mdadm: added /dev/sdk1 to /dev/md0 as 5
mdadm: added /dev/sdm1 to /dev/md0 as 6
mdadm: no uptodate device for slot 7 of /dev/md0
mdadm: added /dev/sdg1 to /dev/md0 as 8
mdadm: no uptodate device for slot 9 of /dev/md0
mdadm: added /dev/sdl1 to /dev/md0 as 10
mdadm: added /dev/sdj1 to /dev/md0 as 11
mdadm: added /dev/sdi1 to /dev/md0 as -1
mdadm: added /dev/sdc1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 10 drives and 1 spare - not enough to start the array.

Is it possible to get the /dev/sdi1 included to be able to start the array?

/dev/sdi1 is the disk that i replaced.

**vwlinux** · July 21st, 2018

Here is the output of mdadm.conf

Code:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
#ARRAY /dev/md/0 metadata=1.2 UUID=9c5baecf:58212783:fe438251:3b70e113 name=myserver:0

# This file was auto-generated on Wed, 21 May 2014 11:57:09 +0200
# by mkconf $Id$
ARRAY /dev/md/0 metadata=1.2 UUID=9c5baecf:58212783:fe438251:3b70e113 name=myserver:0 spares=1

I run this command:

Code:

mdadm --examine --scan >> /etc/mdadm/mdadm.conf

Before i had no spare in the array.
I think sdi1 is set to spare because is was not finished with rebuild.
Could this be the case?

**darkod** · July 21st, 2018

I am not sure what happened during the rebuild. The counter for sdi1 looked OK, but I am not 100% sure how it should look if the rebuild didn't complete.

You can try assembling the array using sde1 instead of sdi1 in that same command. The counter for sde1 is very close, and it should start... Don't forget to stop the array first.

That is what I would try next. To run those same commands only with sde1 instead of sdi1 in the member list.

**vwlinux** · July 22nd, 2018

Here is the output of command with sde instead of sdi:

Code:

# mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcdefghjklm]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 4.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 8.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdj1 is identified as a member of /dev/md0, slot 11.
mdadm: /dev/sdk1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdl1 is identified as a member of /dev/md0, slot 10.
mdadm: /dev/sdm1 is identified as a member of /dev/md0, slot 6.
mdadm: forcing event count in /dev/sde1(7) from 870672 upto 870676
mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sde1
mdadm: Marking array /dev/md0 as 'clean'
mdadm: added /dev/sdh1 to /dev/md0 as 1
mdadm: added /dev/sdb1 to /dev/md0 as 2
mdadm: added /dev/sdf1 to /dev/md0 as 3
mdadm: added /dev/sdd1 to /dev/md0 as 4
mdadm: added /dev/sdk1 to /dev/md0 as 5
mdadm: added /dev/sdm1 to /dev/md0 as 6
mdadm: added /dev/sde1 to /dev/md0 as 7
mdadm: added /dev/sdg1 to /dev/md0 as 8
mdadm: no uptodate device for slot 9 of /dev/md0
mdadm: added /dev/sdl1 to /dev/md0 as 10
mdadm: added /dev/sdj1 to /dev/md0 as 11
mdadm: added /dev/sdc1 to /dev/md0 as 0
mdadm: /dev/md0 assembled from 11 drives - not enough to start the array.

Here is the output of cat /proc/mdstat

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdc1[0](S) sdj1[13](S) sdl1[12](S) sdg1[9](S) sde1[8](S) sdm1[7](S) sdk1[6](S) sdd1[11](S) sdf1[3](S) sdb1[2](S) sdh1[1](S)
      32232905142 blocks super 1.2

unused devices: <none>

**darkod** · July 23rd, 2018

Strange, it is refusing to assemble with 11 drives according to that last message. You could try forcing it to run by adding the parameter --run to the command. That would be the last attempt with --assemble. If that doesn't work, there is only one more thing left to do...

**vwlinux** · July 23rd, 2018

I run the command with sde, but nothing happend.

Then i rebootet the server and this is the output of cat /proc/mdstat

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdh1[1] sdf1[3] sdb1[2] sdj1[13] sdm1[7] sdk1[6] sdi1[14] sdg1[9] sdl1[12] sde1[8] sdc1[0] sdd1[11]
      32232904704 blocks super 1.2 level 5, 512k chunk, algorithm 2 [12/11] [UUUUUUUUU_UU]
      [>....................]  recovery =  4.3% (128174280/2930264064) finish=338.1min speed=138120K/sec

unused devices: <none>

This was the output of mdadm -D /dev/md0

Code:

# mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
  Used Dev Size : 2930264064 (2794.52 GiB 3000.59 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Mon Jul 23 23:57:53 2018
          State : clean, degraded, recovering
 Active Devices : 11
Working Devices : 12
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 4% complete

           Name : ingfil:0  (local to host ingfil)
           UUID : 9c5baecf:58212783:fe438251:3b70e113
         Events : 870682

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8      113        1      active sync   /dev/sdh1
       2       8       17        2      active sync   /dev/sdb1
       3       8       81        3      active sync   /dev/sdf1
      11       8       49        4      active sync   /dev/sdd1
       6       8      161        5      active sync   /dev/sdk1
       7       8      193        6      active sync   /dev/sdm1
       8       8       65        7      active sync   /dev/sde1
       9       8       97        8      active sync   /dev/sdg1
      14       8      129        9      spare rebuilding   /dev/sdi1
      12       8      177       10      active sync   /dev/sdl1
      13       8      145       11      active sync   /dev/sdj1

This is the output from the server now:

Code:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sde1[8](F) sdd1[11] sdb1[2] sdc1[0] sdg1[9] sdf1[3] sdk1[6] sdl1[12] sdi1[14](S) sdh1[1] sdj1[13] sdm1[7]
      32232904704 blocks super 1.2 level 5, 512k chunk, algorithm 2 [12/10] [UUUUUUU_U_UU]

unused devices: <none>

Code:

# mdadm -D /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Thu Jan 24 20:36:48 2013
     Raid Level : raid5
     Array Size : 32232904704 (30739.69 GiB 33006.49 GB)
  Used Dev Size : 2930264064 (2794.52 GiB 3000.59 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Tue Jul 24 07:09:10 2018
          State : clean, FAILED
 Active Devices : 10
Working Devices : 11
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

           Name : ingfil:0  (local to host ingfil)
           UUID : 9c5baecf:58212783:fe438251:3b70e113
         Events : 870772

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8      113        1      active sync   /dev/sdh1
       2       8       17        2      active sync   /dev/sdb1
       3       8       81        3      active sync   /dev/sdf1
      11       8       49        4      active sync   /dev/sdd1
       6       8      161        5      active sync   /dev/sdk1
       7       8      193        6      active sync   /dev/sdm1
       7       0        0        7      removed
       9       8       97        8      active sync   /dev/sdg1
       9       0        0        9      removed
      12       8      177       10      active sync   /dev/sdl1
      13       8      145       11      active sync   /dev/sdj1

       8       8       65        -      faulty spare   /dev/sde1
      14       8      129        -      spare   /dev/sdi1

Is it possible to replace the sde disk with the disk i just removed?
I might be less errors on this?

Thread: mdadm RAID5 array 2 drives removed

Thread Tools

Display

mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Re: mdadm RAID5 array 2 drives removed

Bookmarks

Bookmarks

Posting Permissions