Results 1 to 6 of 6

Thread: Is my RAID array toast? Need troubleshooting help.

  1. #1
    Join Date
    Apr 2013
    Beans
    3

    Is my RAID array toast? Need troubleshooting help.

    Hey there,

    Last night I received an e-mail from mdadm about the possible failure of two drives on my array. The raid array was set up as a 4 2TB drive raid5 with one hot spare. Is this system truly fried? Did the hot spare pick up anything at all, or did the two drives fail at once? Did one drive fail, start to rebuild onto the spare, and then cause another drive failure? I'm fairly new to working with raids, and this system is one I inherited from a previous employee, so I'm unsure of what the proper troubleshooting steps are here. Any help would be much appreciated.

    Output of cat /proc/mdstat:
    Code:
    sudo cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
    md0 : active raid5 sdc[4](F) sdd[5](F) sda[6](S) sdb[0] sde[3]
          5860543488 blocks level 5, 64k chunk, algorithm 2 [4/2] [U__U]
    
    Output of mdadm --detail:
    Code:
    #sudo mdadm --detail /dev/md0
    
    /dev/md0:
            Version : 0.90
      Creation Time : Mon Jun 21 13:54:13 2010
         Raid Level : raid5
         Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
      Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
       Raid Devices : 4
      Total Devices : 5
    Preferred Minor : 0
        Persistence : Superblock is persistent
    
        Update Time : Mon Apr 29 10:52:27 2013
              State : clean, FAILED
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 2
      Spare Devices : 1
    
             Layout : left-symmetric
         Chunk Size : 64K
    
               UUID : 2874db80:a0f02d66:999df3c7:ff8f8e6e (local to host bigkahuna)
             Events : 0.10984
    
        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       0        0        1      removed
           2       0        0        2      removed
           3       8       64        3      active sync   /dev/sde
    
           4       8       32        -      faulty spare   /dev/sdc
           5       8       48        -      faulty spare   /dev/sdd
           6       8        0        -      spare   /dev/sda
    
    

  2. #2
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,910
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Is my RAID array toast? Need troubleshooting help.

    First, for future reference, it's better to use partitions as mdadm members instead of whole disks like you did. If one disk was a hot spare, the array should still work. Do you know which disk was the spare? It would help if you know and tell us. PS: I jusr reread your post, is /dev/sde the hot spare?

    Also, post the --examine output:
    Code:
    sudo mdadm --examine /dev/sd[abcde]
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  3. #3
    Join Date
    Apr 2013
    Beans
    3

    Re: Is my RAID array toast? Need troubleshooting help.

    Thanks, it seems like I'll need to read some guides to get a better understanding of working with RAIDs. If you have any advice in that regard, I'd appreciate it. What is the advantage of partition members? I believe /dev/sde was the spare that was installed.

    output of sudo mdadm -- examine /dev/sd[abcde]:

    Code:
    sudo mdadm --examine /dev/sd[abcde]
    
    /dev/sda:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : 2874db80:a0f02d66:999df3c7:ff8f8e6e (local to host bigkahuna)
    
      Creation Time : Mon Jun 21 13:54:13 2010
         Raid Level : raid5
    
      Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
         Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
       Raid Devices : 4
      Total Devices : 5
    Preferred Minor : 0
    
    
        Update Time : Mon Apr 29 11:33:25 2013
              State : clean
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 2
      Spare Devices : 1
    
           Checksum : 1dcdcbb6 - correct
             Events : 10988
    
             Layout : left-symmetric
         Chunk Size : 64K
    
    
          Number   Major   Minor   RaidDevice State
    
    this     6       8        0        6      spare   /dev/sda
    
       0     0       8       16        0      active sync   /dev/sdb
       1     1       0        0        1      faulty removed
       2     2       0        0        2      faulty removed
       3     3       8       64        3      active sync   /dev/sde
       4     4       8       32        4      faulty   /dev/sdc
    /dev/sdb:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : 2874db80:a0f02d66:999df3c7:ff8f8e6e (local to host bigkahuna)
    
      Creation Time : Mon Jun 21 13:54:13 2010
         Raid Level : raid5
    
      Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
         Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
       Raid Devices : 4
      Total Devices : 5
    Preferred Minor : 0
    
    
        Update Time : Mon Apr 29 11:33:25 2013
              State : clean
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 2
      Spare Devices : 1
    
           Checksum : 1dcdcbc0 - correct
             Events : 10988
    
             Layout : left-symmetric
         Chunk Size : 64K
    
    
          Number   Major   Minor   RaidDevice State
    
    this     0       8       16        0      active sync   /dev/sdb
    
       0     0       8       16        0      active sync   /dev/sdb
       1     1       0        0        1      faulty removed
       2     2       0        0        2      faulty removed
       3     3       8       64        3      active sync   /dev/sde
       4     4       8       32        4      faulty   /dev/sdc
    mdadm: No md superblock detected on /dev/sdc.
    mdadm: No md superblock detected on /dev/sdd.
    /dev/sde:
              Magic : a92b4efc
            Version : 0.90.00
               UUID : 2874db80:a0f02d66:999df3c7:ff8f8e6e (local to host bigkahuna)
    
      Creation Time : Mon Jun 21 13:54:13 2010
         Raid Level : raid5
    
      Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
         Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
       Raid Devices : 4
      Total Devices : 5
    Preferred Minor : 0
    
    
        Update Time : Mon Apr 29 11:33:25 2013
              State : clean
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 2
      Spare Devices : 1
    
           Checksum : 1dcdcbf6 - correct
             Events : 10988
    
             Layout : left-symmetric
         Chunk Size : 64K
    
    
          Number   Major   Minor   RaidDevice State
    
    this     3       8       64        3      active sync   /dev/sde
    
       0     0       8       16        0      active sync   /dev/sdb
       1     1       0        0        1      faulty removed
       2     2       0        0        2      faulty removed
       3     3       8       64        3      active sync   /dev/sde
       4     4       8       32        4      faulty   /dev/sdc
    Running smartctl returned the following error for sdc and sdd:

    Code:
    sudo smartctl -a -T permissive /dev/sdc
    smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.0.0-14-generic] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
    
    Vendor:               /1:0:0:0
    Product:              
    User Capacity:        600,332,565,813,390,450 bytes [600 PB]
    Logical block size:   774843950 bytes
    >> Terminate command early due to bad response to IEC mode page
    
    Error Counter logging not supported
    Device does not support Self Test logging

  4. #4
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,910
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Is my RAID array toast? Need troubleshooting help.

    Well, sdc and sdd have no superblock on them, don't know how it was lost. But even in this situation the other three disks have equal event counters which means you should be able to assemble the array as degraded. It was a 4 disk raid5 array and it should work with 3 disks.

    Try something like:
    Code:
    sudo mdadm --stop /dev/md0 #(stop the array first)
    sudo mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sde
    Tell us whether it says something like device started with 3 disks, or it fails.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  5. #5
    Join Date
    Apr 2013
    Beans
    3

    Re: Is my RAID array toast? Need troubleshooting help.

    I tried doing so, and got the message:

    Code:
    /dev/md0 assembled from 2 drives and one spare - not enough to start the array.
    Which I assume means that the process did not work?

    I tried the same after rebooting the system, to the same effect. But oddly, sdc and sdd both had superblocks again when I checked them. Using the advice of another poster in another forum I asked for advice:

    Code:
    sudo mdadm --assemble /dev/md0 --scan --force
    Which then resulted in this:

    Code:
    mdadm: forcing event count in /dev/sdc(1) from 10977 upto 10988
    mdadm: forcing event count in /dev/sdd(2) from 10977 upto 10988
    mdadm: clearing FAULTY flag for device 3 in /dev/md0 for /dev/sdc
    mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdd
    mdadm: /dev/md0 has been started with 4 drives and 1 spare.
    
     sudo cat /proc/mdstat
    
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : active raid5 sdb[0] sda[6](S) sde[3] sdd[2] sdc[1]
          5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
    
    unused devices: <none>
    I then mounted the array and have had no trouble accessing files so far...what happened here? Is this truly reassembled, or just a ticking time bomb or something? Checking SMART errors on ALL 5 drives shows Pre-Fail on attributes #1, #3, #5 (read error rate, spin-up time, reallocated sectors count) and old-age on the rest of the attribute #s. I'm assuming this means it's time to back up and replace these drives sooner rather than later? How many bad sectors are too many? Are these HD's just on their last legs? Knowing the next step here before a disaster would be great.

  6. #6
    Join Date
    Jul 2010
    Location
    Michigan, USA
    Beans
    2,123
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Is my RAID array toast? Need troubleshooting help.

    Could we see the full smart output for each disk?
    Code:
    smartctl -a /dev/sda
    smartctl -a /dev/sdb
    smartctl -a /dev/sdc
    smartctl -a /dev/sdd
    smartctl -a /dev/sde
    Also, how are these disks connected to the computer (via SATA to the motherboard or through a PCIe card)?

    It would be nice to see the dmesg output grepped for md0.
    Last edited by rubylaser; April 30th, 2013 at 12:19 AM.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •