MD raid trouble.

**rcastberg** · March 31st, 2012

Hi,

I am having a problem with my RAID array. After going on vacation for a couple of weeks I have come back unable to access any of the data. I would appreciate it if anybody can help as i don't want to loose my data by rebuiling the array incorrectly.
I am not sure why the problem happened and help in understanding why would also be appreciated.

I used to have a ZFS filesystem on 3 of the 5 drives.
After setting up the mdraid i formated the filesystem with btrfs. (I realize this is my next problem if i can get as far as loading up the array). But my hope is to be able to extracted media so i don't have to rerip everything.

The drives should be set up in a RAID-6 setup.

Hope i included the most usefull data

Thanks for reading this far.

René

Code:

$ uname -a
Linux Asgard 3.0.0-16-server #29-Ubuntu SMP Tue Feb 14 13:08:12 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Code:

$ sudo mdadm --examine /dev/sd*
mdadm: No md superblock detected on /dev/sda.
mdadm: No md superblock detected on /dev/sda1.
mdadm: No md superblock detected on /dev/sda2.
mdadm: No md superblock detected on /dev/sda5.
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0db0c336:f56bd888:2f9e92e4:c1d64c09
           Name : Asgard:0  (local to host Asgard)
  Creation Time : Sat Jan 28 14:29:36 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8e861931:6f7448ec:6f2c6cd2:74cb06a2

    Update Time : Tue Mar 27 18:25:33 2012
       Checksum : a4305a9b - correct
         Events : 315451

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : ..A.A ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0db0c336:f56bd888:2f9e92e4:c1d64c09
           Name : Asgard:0  (local to host Asgard)
  Creation Time : Sat Jan 28 14:29:36 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : ef9d6881:b31e81b4:9403e07a:4280392d

    Update Time : Tue Mar 27 18:25:33 2012
       Checksum : f05ad7f8 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : ..A.A ('A' == active, '.' == missing)
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0db0c336:f56bd888:2f9e92e4:c1d64c09
           Name : Asgard:0  (local to host Asgard)
  Creation Time : Sat Jan 28 14:29:36 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 387de195:c966497f:f7fad598:cfc7fb10

    Update Time : Tue Mar 27 18:25:33 2012
       Checksum : d1543e47 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : ..A.A ('A' == active, '.' == missing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0db0c336:f56bd888:2f9e92e4:c1d64c09
           Name : Asgard:0  (local to host Asgard)
  Creation Time : Sat Jan 28 14:29:36 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8e1a4400:e00d30d2:faf68bb0:99e13cb1

    Update Time : Tue Mar 27 18:25:33 2012
       Checksum : 46949882 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : ..A.A ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 0db0c336:f56bd888:2f9e92e4:c1d64c09
           Name : Asgard:0  (local to host Asgard)
  Creation Time : Sat Jan 28 14:29:36 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 418fe563:d3c6d16b:54762ab7:8b6aced2

    Update Time : Tue Mar 27 18:25:33 2012
       Checksum : 6c0b9ead - correct
         Events : 315451

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : ..A.A ('A' == active, '.' == missing)

Code:

Output of parted:
# parted -l /dev/sd[bcdef]
Error: The primary GPT table is corrupt, but the backup appears OK, so that will
be used.
OK/Cancel? ok                                                             
Model: ATA ST2000DL003-9VT1 (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2000GB  2000GB               zfs
 9      2000GB  2000GB  8389kB


Error: The primary GPT table is corrupt, but the backup appears OK, so that will be used.
OK/Cancel? ok                                                             
Model: ATA ST2000DL003-9VT1 (scsi)
Disk /dev/sdc: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2000GB  2000GB               zfs
 9      2000GB  2000GB  8389kB


Error: /dev/sdd: unrecognised disk label                                  

Error: /dev/sde: unrecognised disk label                                  

Error: The primary GPT table is corrupt, but the backup appears OK, so that will be used.
OK/Cancel? ok                                                             
Model: ATA ST2000DL003-9VT1 (scsi)
Disk /dev/sdf: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2000GB  2000GB               zfs
 9      2000GB  2000GB  8389kB

Build:

Code:

#created a raid 5 array with 3 disks. Don't have the precise commands anymore.
sudo mdadm --add /dev/md127 /dev/sdc /dev/sdd
sudo mdadm --grow --level=6 --raid-devices=5 /dev/md127 --backup-file=/home/renec/raid.backup

**rubylaser** · March 31st, 2012

Boy, this is a mess, and a fairly convoluted setup. If you get this remounted, you'll want to back up your data, and write zeroes over all of these disks to remove all traces of ZFS and mdadm, and then start over. Also, I wouldn't use btrfs at this point because it's still be developed.

What's in your /etc/mdadm/mdadm.conf file?

Code:

cat /etc/mdadm/mdadm.conf

What do these show?

Code:

cat /proc/mdstat
mdadm --detail --scan

**rcastberg** · April 1st, 2012

Yeah, i plan on doing something like that. I wasn't aware zfs would still be visible after the raid initialization.

I am aware of the status of btrfs but wanted something with checksumming so was willing to take that risk.

The mdadm.conf is still in its original state. (Mostly commented out) Forgot to add the relavent info.

Code:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdf[3](S) sdb[4](S) sdd[5](S)
      5860540680 blocks super 1.2

Code:

sudo mdadm --detail --scan
mdadm: md device /dev/md/0 does not appear to be active.

**rubylaser** · April 1st, 2012

It looks like you just need to try to force assemble the array and then put a proper mdadm.conf file back in place.

Code:

mdadm --assemble --force /dev/md0 /dev/sd[bcdef]

If that assemblies properly, then you'll want to recreate your mdadm.conf file.

Code:

echo "DEVICE partitions" > /etc/mdadm/mdadm.conf
echo "HOMEHOST <system>" >> /etc/mdadm/mdadm.conf
echo "MAILADDR root" >> /etc/mdadm/mdadm.conf
mdadm --detail --scan >> /etc/mdadm/mdadm.conf

I'm assuming that you didn't update your mdadm.conf after changing to a RAID6 and that was the cause of this situation. Let me know how this assemble goes.

**rcastberg** · April 1st, 2012

Cheers, tried that, no luck.

Code:

$ sudo mdadm --stop /dev/md0
mdadm: stopped /dev/md0
$ lsof /dev/sde 
$ lsof /dev/sdd
$sudo mdadm --assemble --force /dev/md0 /dev/sd[bcdef]
mdadm: failed to add /dev/sdd to /dev/md0: Device or resource busy
mdadm: failed to add /dev/sde to /dev/md0: Device or resource busy
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
$mdadm --stop  /dev/md0
mdadm: stopped /dev/md0

Gives me:

Code:

$ sudo mdadm --detail /dev/md0 
mdadm: md device /dev/md0 does not appear to be active.
$ sudo cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]                                                                        
md0 : inactive sdf[3](S) sdc[5](S) sdb[4](S)
      5860540680 blocks super 1.2
       
unused devices: <none>

Not sure why i am getting the device or resource busy... Especially as these devies seem to be the clean ones from the mdadm .--examine line. Do you have any idea why one of the drives comes up as spare, as technically i only need 3 of the 5 discs to get at the data.

Thanks for your help so far.

Rene

**rubylaser** · April 2nd, 2012

Those drives aren't the clean one's, they're just the ones that don't have any event counter value. That's not a good sign as their event counters should be the same as the other disks, or very close.

It appears that the dmraid driver is controlling these two drives. It appears that mdadm is confused and assembling incorrectly because of the lack of a proper mdadm.conf file. Can you post what you have here even if it's commented out?

Code:

cat /etc/mdadm/mdadm.conf

Have you verified these disks are healthy via smartmontools?

Code:

smartctl -d ata -a /dev/sdd
smartctl -d ata -a /dev/sde

**rcastberg** · April 2nd, 2012

Code:

$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR rene-sysadm@castberg.org

# definitions of existing MD arrays
#ARRAY /dev/md0 level=raid1 num-devices=2 UUID=7c88ffa3:9ae1da97:856c4418:84634b4a

For the smartctl selftests i get:
SMART overall-health self-assessment test result: PASSED
for both drives.

**rubylaser** · April 2nd, 2012

Here's what I would do next. The first step is to get a proper mdadm.conf file in place so mdadm knows how to correctly assemble the array on bootup. Yours should be like this based off of what you've posted so far.

Code:

echo "DEVICE partitions" > /etc/mdadm/mdadm.conf
echo "HOMEHOST <system>" >> /etc/mdadm/mdadm.conf
echo "MAILADDR rene-sysadm@castberg.org" >> /etc/mdadm/mdadm.conf
echo "ARRAY /dev/md0 metadata=1.2 name=Asgard:0 UUID=0db0c336:f56bd888:2f9e92e4:c1d64c09" >> /etc/mdadm/mdadm.conf

Then I'd update initramfs.

Code:

update-initramfs -u

Finally, reboot.

Code:

reboot

**rcastberg** · April 2nd, 2012

Right, adjust the mdadm.conf like you recommended and rebooted. I ended up in initramfs boot console and tried a couple of the recommendations that you came with earlier, no luck. I was still getting the error message about device or resource busy so rebooted using the nodmraid kernel command and it still results in the same error message.

Code:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR rene-sysadm@castberg.org

# definitions of existing MD arrays
#ARRAY /dev/md0 level=raid1 num-devices=2 UUID=7c88ffa3:9ae1da97:856c4418:84634b4a
ARRAY /dev/md0 metadata=1.2 name=Asgard:0 UUID=0db0c336:f56bd888:2f9e92e4:c1d64c09

Code:

[2012-04-02 21:44:54]  Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.0.0-16-server root=UUID=65041921-457e-4790-acde-79864823fe5f ro crashkernel=384M-2G:64M,2G-:128M quiet splash vt.handoff=7 nodmraid

**rubylaser** · April 3rd, 2012

Have you tried to assemble this from the LiveCD to see if you can get it assembled and mounted? Just download and burn the livecd, and boot from the cd. Once, running just apt-get install mdadm, and then try to assemble the array.

Code:

mdadm --assemble /dev/md0 /dev/sd[bcdef]

If you can get this assembled, then you should be able to mount the array, and then backup your data.

Thread: MD raid trouble.

Thread Tools

Display

MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Re: MD raid trouble.

Bookmarks

Bookmarks

Posting Permissions