Page 1 of 4 123 ... LastLast
Results 1 to 10 of 40

Thread: File Server mdadm raid5 has crashed

  1. #1
    Join Date
    Feb 2008
    Beans
    101

    File Server mdadm raid5 has crashed

    6 months ago I built a file server with a RAID array and put all my files on it. Today it stopped working. I got home from work, set my HTPC (in one room) to start playing a media file, which it started streaming immediately. Then I walked to the back room where the server is and could hear all kinds of beeping noises coming from the server tower. I'm not sure if this beeping is truly a "beep" or just mechanical vibration of inner parts, but the tone of the beep is very clear almost like a synthesized beep with no overtones. Also the start and end of the "beep" is very distinct like a 'short' beep of a mobo POST code. I have heard this "beep" before when the drives are physically jarred. All but one are Seagate Barracuda 2TB 7200RPM P/N ST2000DM001-9YN164

    It sounded like several drives were all beeping simultaneously out of phase, and some of the "beeps" were higher and lower pitched. Again I do not know if these are truly beeps, or mechanical noise, but it is FAR from ordinary operating noise of the drives and I knew it meant trouble. The first thing I did was drop to a shell and reboot. After rebooting, the "beeping" started again. Typically on bootup, my BIOS detects all the drives and lists them before continuing, and this took MUCH longer than normal. Then ubuntu started loading (OS is not on the array) and said "/media/vault is not available to be mounted. press s to skip" So I skipped it. I got into Ubuntu and the array is not active. I opened disk utility, and one of the drives seems to be coming on and offline, not good. But the other drives all have passed SMART diagnostics, although with some errors but still "Good" status.

    Obligatory emotions:
    I am such an idiot. My hubris to think I could handle maintenance of this thing just from a few tutorials. I will have lost all my photos for the last 15 years, all of my scanned artwork, all of my writings including 400 pages of notes toward a book, 30 hours of home video from my childhood, all of my programming projects, basically my entire life. I know it's hard to have any sympathy for an idiot that put every meaningful artifact of his entire life on a raid array with no backup, but fwiw I was saving the money to do the backup. But for many TB of data it's not cheap. I'm feeling panic and suffocation just talking about this but trying to keep my emotions in check while I do what I can. The fact that it was streaming media just fine prior to rebooting (weird sounds aside) gives me just the slightest bit of hope, before hitting the yellow pages looking for a grief counselor.
    All parts are 6 months from the store, and the server is powered through a UPS. I am running 12.04, the raid is on 5 or 6 2TB sata drives plugged into an ASRock Z77 motherboard, using RAID5 configured with mdadm.

    I am looking for guidance through the commands to check everything there is to check to understand the nature of the problem(s) and the probability of recovering any data. If I have to I will look into professional data recovery. But I don't know if I should do that now or try some things first.

  2. #2
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,786
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: File Server mdadm raid5 has crashed

    The first thing to check would be the md device state. If you have it powered down, boot it again skipping the mounting of the array (since you have to do that to continue the boot), and post the output of:
    cat /proc/mdstat

    That should show all md devices, how many disks/partitions are members, and how many are active.

    EDIT PS: On another note, for a storage of this importance, I would seriously consider FreeNAS with RAIDZ2, or at least change the array to raid6. You will have less usanle space, but it can handle two disk failures and still work. But we can discuss that later.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  3. #3
    Join Date
    Jan 2010
    Location
    Germany
    Beans
    165
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: File Server mdadm raid5 has crashed

    I hear your pain man. I had a similar situation once upon a time and from that point I stayed away from software RAID.

    You have a lot of storage space which is a dam shame really.

    First thing I would do is this.

    Hit the mdadm –examine command to find which drives have failed.

    I have found a few good articles on how to recover data:

    http://askubuntu.com/questions/64156...re-raid5-disks

    http://ubuntuforums.org/showthread.php?t=958369

    In fact after a quick Google search there are tonnes of articles regarding your issue. I am sure one of them will assist you with your situation.
    Try not to be a man of success but be a man of value
    USE FUL LINKS
    Ubuntu Server setup guide
    setting up a DNS Server on Ubuntu

  4. #4
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,786
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: File Server mdadm raid5 has crashed

    I would wait before assigning blame to mdadm. As the Op stated, the beeps are not heard until you actually go into the cabinet of the server. In theory, one disk could have failed a while ago and raid5 array would keep on going. If you don't have any mechanism to do regular checks, you wouldn't even know about it.

    You will only find out when a second disk fails.

    It could be HW issue, or mdadm issue, too early to say.

    If it's HW issue, what ever configuration you have (fakeraid, HW card, SW raid) a raid5 array will fail with two devices failed. Further more, fakeraid which is most commonly used at home, will probably be unrecoverable, while mdadm might still be recoverable. It's definitely more flexible and better to use compared to fakeraid.

    If you invest in server grade HW cards at home, that's another story.

    With all the urgency, the OP doesn't seem to be checking the thread too often.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  5. #5
    Join Date
    Feb 2008
    Beans
    101

    Re: File Server mdadm raid5 has crashed

    I took a day away from thinking about it to calm down and be rational. The "beeping" somehow seems to have stopped. Now when I boot the machine it comes up and drops me to an initramfs prompt to try to trouble shoot the array, which I exit out of. Then skip the lack of the array at ubuntu start up.


    Code:
    barrett@mainframe:~$ cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sda[5](S) sdb[0](S) sdd[4](S) sdc[2](S)
          7813534048 blocks super 1.2
           
    unused devices: <none>
    barrett@mainframe:~$ sudo mdadm --examine
    mdadm: No devices to examine
    barrett@mainframe:~$ sudo mdadm --examine /dev/md0
    [sudo] password for barrett: 
    barrett@mainframe:~$
    As you can see 4 drives are up for the array. My end goal was to grow this to 8 drives RAID6 where one was a hotspare. I am embarrassed to say I cannot remember exactly where I was in that process, if I had a 5 or 6 drives on it. I think it was 4 + raid5 parity + hotspare. Is there any way to even verify this?

    mdadm --examine didn't seem to turn much up (maybe because the array is inactive?)

    also if this is helpful
    Code:
    barrett@mainframe:~$ blkid
    /dev/sda: UUID="8bc78af0-d9a9-81e3-7354-9f212f76cd24" UUID_SUB="f18da9cc-27f5-eee4-61ba-900edd6ca8b9" LABEL="mainframe:vault" TYPE="linux_raid_member" 
    /dev/sdb: UUID="8bc78af0-d9a9-81e3-7354-9f212f76cd24" UUID_SUB="004a89c7-bd03-e0fe-b6ea-3ab976e5e5e0" LABEL="mainframe:vault" TYPE="linux_raid_member" 
    /dev/sdc: UUID="8bc78af0-d9a9-81e3-7354-9f212f76cd24" UUID_SUB="a6ad29b7-35b5-46ae-4bc2-a5afe6bc8252" LABEL="mainframe:vault" TYPE="linux_raid_member" 
    /dev/sdd: UUID="8bc78af0-d9a9-81e3-7354-9f212f76cd24" UUID_SUB="1df1fd17-592f-431a-f3f0-5592fbfccdcd" LABEL="mainframe:vault" TYPE="linux_raid_member" 
    /dev/sde1: UUID="0acc3250-919f-44c1-a076-ed66ff5d9684" TYPE="ext4" 
    /dev/sdf1: LABEL="Library2" UUID="72fd10b8-8a5b-4f11-be62-da7f397f44b7" TYPE="ext4" 
    /dev/sdg1: LABEL="Library1" UUID="17b74dce-e201-480e-9ad0-761123175c41" TYPE="ext4" 
    /dev/sdh1: UUID="d56d31ff-3c42-4b8e-a8c1-bd1cea757963" TYPE="ext4" 
    /dev/sdh2: UUID="d56d31ff-3c42-4b8e-a8c1-bd1cea757963" TYPE="ext4" 
    /dev/sdh3: UUID="e80ece6a-6703-4d2c-82a3-706797264f49" TYPE="swap"

    cat /etc/mdadm/mdadm.conf
    ...
    ARRAY /dev/md0 metadata=1.2 name=mainframe:vault UUID=8bc78af0:d9a981e3:73549f21:2f76cd24
    ...

  6. #6
    Join Date
    Apr 2012
    Beans
    5,900

    Re: File Server mdadm raid5 has crashed

    Quote Originally Posted by apokkalyps View Post
    Code:
    barrett@mainframe:~$ cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sda[5](S) sdb[0](S) sdd[4](S) sdc[2](S)
          7813534048 blocks super 1.2
           
    unused devices: <none>
    barrett@mainframe:~$ sudo mdadm --examine
    mdadm: No devices to examine
    barrett@mainframe:~$ sudo mdadm --examine /dev/md0
    [sudo] password for barrett: 
    barrett@mainframe:~$
    .
    .
    .
    mdadm --examine didn't seem to turn much up (maybe because the array is inactive?)
    You need to run it on the array members not the assembled /dev/mdX array, I think, e.g.

    Code:
    sudo mdadm --examine /dev/sd[a-d]

  7. #7
    Join Date
    Feb 2008
    Beans
    101

    Re: File Server mdadm raid5 has crashed

    Code:
    barrett@mainframe:~$ sudo mdadm --examine /dev/sd[a-e]
    /dev/sda:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
               Name : mainframe:vault  (local to host mainframe)
      Creation Time : Wed Aug 15 21:57:14 2012
         Raid Level : raid5
       Raid Devices : 5
    
     Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
         Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
      Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : f18da9cc:27f5eee4:61ba900e:dd6ca8b9
    
        Update Time : Tue Jan 29 04:34:28 2013
           Checksum : 9139486d - correct
             Events : 7170
    
             Layout : left-symmetric
         Chunk Size : 512K
    
       Device Role : Active device 4
       Array State : A.AAA ('A' == active, '.' == missing)
    /dev/sdb:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
               Name : mainframe:vault  (local to host mainframe)
      Creation Time : Wed Aug 15 21:57:14 2012
         Raid Level : raid5
       Raid Devices : 5
    
     Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
         Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
      Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 004a89c7:bd03e0fe:b6ea3ab9:76e5e5e0
    
        Update Time : Tue Jan 29 04:34:57 2013
           Checksum : 77edbc17 - correct
             Events : 7176
    
             Layout : left-symmetric
         Chunk Size : 512K
    
       Device Role : Active device 0
       Array State : A.... ('A' == active, '.' == missing)
    /dev/sdc:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
               Name : mainframe:vault  (local to host mainframe)
      Creation Time : Wed Aug 15 21:57:14 2012
         Raid Level : raid5
       Raid Devices : 5
    
     Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
         Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
      Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : a6ad29b7:35b546ae:4bc2a5af:e6bc8252
    
        Update Time : Mon Jan 28 20:04:27 2013
           Checksum : 7f030895 - correct
             Events : 7169
    
             Layout : left-symmetric
         Chunk Size : 512K
    
       Device Role : Active device 2
       Array State : AAAAA ('A' == active, '.' == missing)
    /dev/sdd:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x0
         Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
               Name : mainframe:vault  (local to host mainframe)
      Creation Time : Wed Aug 15 21:57:14 2012
         Raid Level : raid5
       Raid Devices : 5
    
     Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
         Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
      Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
        Data Offset : 262144 sectors
       Super Offset : 8 sectors
              State : clean
        Device UUID : 1df1fd17:592f431a:f3f05592:fbfccdcd
    
        Update Time : Tue Jan 29 04:34:42 2013
           Checksum : a9ccac88 - correct
             Events : 7171
    
             Layout : left-symmetric
         Chunk Size : 512K
    
       Device Role : Active device 3
       Array State : A.AAA ('A' == active, '.' == missing)
    /dev/sde:
       MBR Magic : aa55
    Partition[0] :   3907029167 sectors at            1 (type ee)


    Also, I know that there is a command that would enumerate these but I can't seem to google it up so I took some screenshots of disk utility
    http://pastehtml.com/view/cr3p7ugrg.html to view them or the individual images:
    /dev/sda : http://i.snag.gy/tXkj2.jpg
    /dev/sdb : http://i.snag.gy/ZluyV.jpg
    /dev/sdc : http://i.snag.gy/nFFh6.jpg
    /dev/sdd : http://i.snag.gy/8qRms.jpg
    /dev/sde : http://i.snag.gy/MdvXc.jpg
    Array : http://i.snag.gy/pUG3w.jpg

    (if you would rather me return the output of a command ill do that)

    Notice that the array says it has 5 components. There are actually 6 drives plugged SATA into my motherboard (not counting the SSD). One drive is not showing up at all, you can see that between sdb and sdc it goes from port 2 to port 4 on the sata controller. The drive that's on port 3 is not registering apparently. So if that drive is dead, I still have 6 of the original 5 drives showing green smart statuses. Except /dev/sde is wierd. As you can see in the screenshot, disk utility says that unlike all the other ones sde has an ext4 partition on it, LABLED "Linux RAID". I don't know how it got like that but it doesn't seem to be part of the array. (does it?) So I have 4 out of 5 of the components running, should it not be assembling?

    Also, my /proc/mdstat shows all the drives in the array with (S) next to them. Doesn't that mean it's a hot spare? Does it think all the drives are spares?
    Last edited by CharlesA; February 2nd, 2013 at 03:55 AM. Reason: code tags

  8. #8
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,786
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: File Server mdadm raid5 has crashed

    Yeah, somehow it has marked all as spares. A raid5 array should still assemble with 4 disks out of 5.

    As for the partitons sde1, I guess you used the whole disks as devices when you first created the array sda-sdd, but when adding sde you actually created a partition on it. mdadm can work either way, but I have read it's better to be consistent and not mix them up. You either work with whole disks, or partitions on them.
    I prefer creating partitions even if you use the whole disk as a single partition, but it's not necessary.

    The event counters are different on all 4 disks shown by examine. And sde doesn't show at all. Since it has a partition, what if you try like:
    sudo mdadm --examine /dev/sde1

    You can start by trying to auto assemble it, but I doubt it will work:
    sudo mdadm --assemble --scan

    I think the different counters will not allow it. The good news is that the counter values are very close, there is a way to force it to assemble in cases like that while not damaging the data. I just have to find the correct syntax.

    I will also PM someone who is a much better expert in mdadm, lets see if he can join in.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  9. #9
    Join Date
    Nov 2009
    Location
    Segur De Calafell, Spain
    Beans
    11,786
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: File Server mdadm raid5 has crashed

    I found a similar case where all disks were marked as Spare. Look here in the first post in the first code section:
    http://ubuntuforums.org/showthread.php?t=1838932

    Later in the post you can see he did manage to assmeble it by using the --force option and the --scan option letting it find its own disk members.

    So, in your case it would also be something like:
    sudo mdadm --assemble --scan --force /dev/md0

    I think that will assemble it only with sda-sdd which are currently shown as members by /proc/mdstat.

    With 4 disks it should still become active. If it does, you can consider whether you want to add to the array sde1 as partition, or delete the partition first and add it as a disk sde like the others are.

    You can wait with the above procedure if you want, for someone else to give their opinion. In the thread linked above, and another thread I found, it seems to have worked with --force. As I already said, your counters are very close to each other which is always good.
    Darko.
    -----------------------------------------------------------------------
    Ubuntu 14.04 LTS 64bit & Windows 7 Ultimate 64bit

  10. #10
    Join Date
    Jul 2010
    Location
    Michigan, USA
    Beans
    2,123
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: File Server mdadm raid5 has crashed

    The first thing I'd do is stop any running arrays.
    Code:
    sudo -i
    mdadm --stop /dev/md0
    The next step is to force assemble the array. If that doesn't work, we can re-create the array's metadata, but I like to leave that step for last. Here's your next step.

    Code:
    mdadm --assemble --force /dev/md0 /dev/sd[abcd]

Page 1 of 4 123 ... LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •