RAID5 spare does not survive reboot

**JanCeuleers** · July 12th, 2008

I'm having the problem that the spare disks on two of my arrays aren't automatically being added upon reboot. I can do so manually with mdadm --add (which causes the spare to be re-added, not just added).

I'd like this to be done automatically. Obviously I can do so in /etc/rc.local but I'd prefer to solve the underlying problem such that the machine will still boot in the event of a degraded array situation.

A bit of background:

I started out with four 320GB disks, each identically partitioned as follows:

Code:

root@via:~# sfdisk -l /dev/sda

Disk /dev/sda: 38913 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sda1          0+    121     122-    979933+  fd  Linux raid autodetect
/dev/sda2        122   38912   38791  311588707+  fd  Linux raid autodetect
/dev/sda3          0       -       0          0    0  Empty
/dev/sda4          0       -       0          0    0  Empty

The four large partitions were used to construct a RAID5 array without spares (although the info below does show the spare that I've since added):

Code:

root@via:~# mdadm --detail /dev/md2
/dev/md2:
        Version : 00.90.03
  Creation Time : Fri Jun 22 20:56:51 2007
     Raid Level : raid5
     Array Size : 934765824 (891.46 GiB 957.20 GB)
  Used Dev Size : 311588608 (297.15 GiB 319.07 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Sat Jul 12 10:06:41 2008
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 714e46f1:479268a7:895e209c:936fa570
         Events : 0.13758

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2

       4       8       82        -      spare   /dev/sdf2

The smaller partitions were turned into two RAID1 arrays, used as two swap partitions with equal priority:

Code:

root@via:~# mdadm --detail /dev/md[01]
/dev/md0:
        Version : 00.90.03
  Creation Time : Fri Jun 22 20:56:08 2007
     Raid Level : raid1
     Array Size : 979840 (957.04 MiB 1003.36 MB)
  Used Dev Size : 979840 (957.04 MiB 1003.36 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Jul 12 08:47:50 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 19e69537:f7a6aec8:5a5f7576:3fc29e0d
         Events : 0.50

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       33        1      active sync   /dev/sdc1
/dev/md1:
        Version : 00.90.03
  Creation Time : Fri Jun 22 20:56:31 2007
     Raid Level : raid1
     Array Size : 979840 (957.04 MiB 1003.36 MB)
  Used Dev Size : 979840 (957.04 MiB 1003.36 MB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sat Jul 12 08:07:54 2008
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

           UUID : 5e728621:c8b356a8:01f8e270:f0e280cb
         Events : 0.54

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       49        1      active sync   /dev/sdd1

       2       8       81        -      spare   /dev/sdf1

So: how to I get the spares to automatically be re-enabled upon reboot?

I also want the /dev/sdf1 spare to migrate between /dev/md0 and /dev/md1 if the other array is degraded. To this end I've added spare-group definitions to /etc/mdadm/mdadm.conf, but I'm not sure that file is actually being used for anything.

Just for reference I include /etc/mdadm/mdadm.conf below.

Code:

root@via:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=19e69537:f7a6aec8:5a5f7576:3fc29e0d spare-group=swapgroup
ARRAY /dev/md1 level=raid1 num-devices=2 spares=1 UUID=5e728621:c8b356a8:01f8e270:f0e280cb spare-group=swapgroup
ARRAY /dev/md2 level=raid5 num-devices=4 spares=1 UUID=714e46f1:479268a7:895e209c:936fa570

# This file was auto-generated on Fri, 22 Jun 2007 19:12:10 +0000
# by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $
MAILFROM root

Thanks!

**fjgaude** · July 12th, 2008

The only thing I've noticed in your data is the UUIDs (generated by mdadm in the mdadm.conf file) don't seem right. Are they?

I don't have any suggestions other than to study the docs at:

/usr/share/doc/mdadm/FAQ.gz and md.txt.gz

You might try some different arrangement for the spare-group... likely the problem is therein.

Good luck!

**JanCeuleers** · July 12th, 2008

Well spotted! Thanks very much; I'll report back here on whether this is the problem when I next reboot the server.

Cheers, Jan

**JanCeuleers** · July 20th, 2008

I have now rebooted the machine following the correction of the UUID mismatch in /etc/mdadm/mdadm.conf

The problem is NOT solved: spare disks are still not being automatically added to arrays on boot.

I'd be grateful for any other hints.

**fjgaude** · July 21st, 2008

Have you read, studied this:

http://shsc.info/LinuxSoftwareRAID

I can't see anything wrong with what you have showed... you might try some different arrangement for the spare-group... likely the problem is therein... good luck!

**JanCeuleers** · July 27th, 2008

It appears that this problem was caused by the "DEVICE partitions" line in /etc/mdadm/mdadm.conf

I have replaced it with the following:

Code:

DEVICE /dev/sd[abcdef][12]

and now the spares are automatically added to the arrays on boot.

Thanks to all who helped.

**fjgaude** · July 27th, 2008

That's good to know... we learn something new daily! Thanks for fixing your own problem.

**djamu** · July 27th, 2008

Hi
By coincidence I stumbled upon this thread, thought I might give you some undocumented info & advice, in case you ever have to rebuild your array(s), a must read to keep those in good health.

MDADM UUID / superblock ,superblock on /dev/nnn doesn't match

MDADM: .... has same UUID but different superblock to

Some questions in regard to your install.

Is there a good reason why you are using 2 swap partitions instead of 1 large as I fail to see any advantage in that?
Because you seem concerned about keeping your data at all cost is there any specific reason why you went for a 4+1 raid5 configuration instead of a 5 disk raid6, as both give you the same amount of raw disk space?

The difference between raid5 and raid6 is that a raid6 gives higher redundancy. ( 2 disks may fail simultaneously ).
What I'm trying to say is this,
case > raid 5 with 1 spare.
As soon as 1 disk fails the spare will start synchronizing, but meanwhile as this will take some time ( especially with large disks )
your array will have no redundancy until synchronizing is finished if for any reason another disk fails at this point > your data is a goner.....

not so for raid6 > as soon as 1 disk fails, your array will continue working as a raid5 without losing redundancy at any given moment.

It's worth noting that you can partition an MD device -now you use it as a large unpartioned superfloppy / USB disk-, instead of partitioning your disks prior creating MD devices.
In other words you can create 1 large array ( /dev/md0 ) which you partition afterward. ( the opposite of what you did now ) The advantage of this is that it saves the troubles of having to configure a spare with matching partitons .... > you ( or someone else with no specific knowledge of your setup ) can then just put in another disk in case of failure...

Please note the differences in syntax between disk and array devices.
While >
/dev/sda1 is the 1st partition of /dev/sda
/dev/md1 is NOT the first partition of /dev/md ( neither is /dev/md0 )

syntax for a partitioned MD array is
/dev/md1p1 as 1st partition of /dev/md1

Use cfdisk for this ( not fdisk ) cfdisk is much easier.
and as always keep in mind that partitioning any device will kill the data on it.
so backup the array before partitioning it.
usage

Code:

sudo cfdisk /dev/md0

Instead of partitioning an array you can also install EVMS or LVM on it > gives you the advantage of being able to create rotating snapshots and detaches your physical disks from your storage layer. ( probably a little far-fetched for your setup ).

Just my 5 cents of good advice.

**JanCeuleers** · July 27th, 2008

First of all, I opened a bug about this issue: https://bugs.launchpad.net/ubuntu/+s...dm/+bug/252365

Djamu,

Thanks for your suggestions.

The reason for my set-up is that I started out with 4 disks and only added the other two later. This determined my choice of RAID5 over RAID6.

The two small RAID1 arrays are mounted as swap partitions with equal priority. This causes the kernel to stripe swap space between these two arrays. I chose this for the following reasons:

- identical layout of the four disks
- I didn't want to swap to RAID5 for performance reasons
- better resilience: if two disks fail the system still has a 50% chance of surviving (but the RAID5 array will be dead)

As I bought sde and sdf later they are less likely to fail at the same time as the first four. Furthermore, they are normally spun down. That's another reason why RAID5 is preferable: lower power consumption (only 4 drives are spinning instead of all 6).

Cheers, Jan

**djamu** · July 27th, 2008

K, let me clear up some widespread misconceptions about RAID.
Unlike what most "pro's" say there's nearly no performance difference between a 4 disk raid0 stripe and a 5 disk raid5.
The very simple reason is that a raid5 is also a stripe featuring redundancy, so performance on reading will almost be equal ( the redundancy check is a simple XOR ( eXclusive OR ) operation... writing might be a little slower as there are 2 write cycles ( the data + parity check )

Originally Posted by JanCeuleers

...This causes the kernel to stripe swap space between these two arrays. I chose this for the following reasons...

So IMHO waste of resources..

- identical layout of the four disks
- I didn't want to swap to RAID5 for performance reasons
- better resilience: if two disks fail the system still has a 50% chance of surviving (but the RAID5 array will be dead)

I just posted a topic about another misconception in RAID partitioning ..
People tend to partition disks in order too create arrays.. This is just plain wrong.. because not all data will be redundant, your data will, but not your partition scheme ...
see my post here
better is to partition your array ( the very opposite, fail for those howto's )

They are normally spun down. That's another reason why RAID5 is preferable: lower power consumption (only 4 drives are spinning instead of all 6).

are you sure about that ? very likely to depend on controller type / power management installed / enabled..
onboard electronics will always be active.. motor spindle doesn't consume much.. most energy is used with head seeking ( which does it very violently, if you ever have the chance to see )

Thread: RAID5 spare does not survive reboot

Thread Tools

Display

RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

SOLVED: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Re: RAID5 spare does not survive reboot

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions