I've got 4 320GB Seagate 7200.11 drives and have the following RAIDs (using mdadm) across them:
Code:
/dev/md0 /boot RAID0
/dev/md1 / RAID10
/dev/md2 /home RAID10
I'm monitoring the status of my RAIDs via conky (screenshot below) ... I've noticed a strange thing that after about a day of running /dev/sdb fails for both of my RAID10 devices:
Code:
cyberkost@raidbox:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid10 sdb3[4](F) sdc3[3] sda3[1] sdd3[2]
552779776 blocks 256K chunks 2 far-copies [4/3] [_UUU]
md1 : active raid10 sda2[1] sdc2[3] sdb2[4](F) sdd2[2]
67103232 blocks 64K chunks 2 far-copies [4/3] [_UUU]
md0 : active raid1 sda1[1] sdc1[3] sdb1[0] sdd1[2]
1052160 blocks [4/4] [UUUU]
unused devices: <none>
This is also signalled by the HDD activity LED being constantly on (though no audible HDD activity). If I try to add /dev/sdb[2,3] partitions back to their respective arrays, I get:
Code:
cyberkost@raidbox:~$ sudo mdadm --add /dev/md1 /dev/sdb2
[sudo] password for kost:
mdadm: Cannot open /dev/sdb2: Device or resource busy
The HDD activity LED goes off after about a day or two and I can add the partition back after that. Alternatively, I can just reboot the machine and although it takes a while (2-3 hours) to come up, I can add the /dev/sdb[2,3] partitions back to their RAID devices when start-up is completed (the machine comes up with /dev/sdb[2,3] missing).
Originally I thought that the HDD that corresponds to /dev/sdb is going bad or the corresponding SATA cable/channel on the motherboar have become faulty, but from reading post1 and post2 it references I started to suspect that it's the upgrade to Jaunty 9.04 that messed my mdadm RAID up.
My suspicions are further deepened by the following facts:
1. sudo smartctl -a /dev/sdb days the drive is fine (and the output for /dev/sdb looks VERY similar to that for the other 3 drives)
2. I first noticed the problem shortly after the upgrade to 9.04, I have not had a problem for a year prior to that.
3. /proc/mdstat used to list /sdaX /sdbX /sdcX /sddX for all 3 RAID devices prior to upgrade (ABCD order). It now has ACBD for /dev/md[0,1] and BCAD for /dev/md2.
4. Looking at mdadm.conf I find UUID contsucts to look rather suspicious -- the last two blocks are the same for all /dev/md[1,2,3], while the first two are different:
Code:
cyberkost@raidbox:~$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=20eef32f:87dc5eba:e368bf24:bd0fce41
ARRAY /dev/md1 level=raid10 num-devices=4 UUID=bacf8d29:0b16d983:e368bf24:bd0fce41
ARRAY /dev/md2 level=raid10 num-devices=4 UUID=67665d86:1cc558d5:e368bf24:bd0fce41
# This file was auto-generated on Thu, 01 Jan 2009 00:37:24 +0000
# by mkconf $Id$
I'd naively expect those UUID fields to be the same across all 3 RAID devices (e.g., if each block is a refence to a particular physical HDD) or be completely different (e.g., if each block is a reference to a particular superblock/parition).
5. Lastly, I see that mdadm.conf has the date of my upgrade to Ubuntu 9.04 Jaunty Jackalope:
Code:
cyberkost@raidbox:~$ ls -la /etc/mdadm/mdadm.conf
-rw-r--r-- 1 root root 874 2009-04-25 23:09 /etc/mdadm/mdadm.conf
... and no, there's no backup file left
I checked if following the advice offered in post2 is going to help, but it seems that it will not ... b/c the command returns the configuration that's already part of my 9.04 mdadm.conf
Code:
cyberkost@raidbox:~$ sudo mdadm --examine --scan --config=mdadm.conf
[sudo] password for kost:
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=20eef32f:87dc5eba:e368bf24:bd0fce41
ARRAY /dev/md1 level=raid10 num-devices=4 UUID=bacf8d29:0b16d983:e368bf24:bd0fce41
ARRAY /dev/md2 level=raid10 num-devices=4 UUID=67665d86:1cc558d5:e368bf24:bd0fce41
PLEASE HE-E-E-E-ELP!!!
Bookmarks