PDA

View Full Version : [ubuntu] Problem with Grub2 not recognizing a RAID 5 array for root



Blue Dude
November 30th, 2011, 04:43 PM
I'm having problems getting Grub to recognize a RAID 5 array that's intended to be a root drive for a Ubuntu 10.04 LTS install.

The really weird part is that it worked perfectly, right up until I had problems with one of the drives.

I have 4 drives in the system. One is a IDE drive that I use to boot Windows and not much else. Three SATA drives make up the Ubuntu system. The three drives are partitioned identically. The first partition is a small RAID 1 for boot (md0). The second is a swap space. The third is a 17GB RAID 5 for the system (md1), total space 34GB. The last is a 1.5 TB RAID 5 for home (md2), total space 3 TB.

I had a problem with one of the SATA drives, so I pulled it, which degraded the two RAID 5 arrays. I decided to use part of the IDE drive to rebuild md1, but there was a problem with one of the remaining drives and the rebuild failed. So I scrapped the md1 array, zeroed the appropriate partitions and rebuilt it from scratch. I lost all the data on the system drive, but I had a recent tar backup.

I used the live CD to mount the new md1, with md0 and md2 mounted in the appropriate folders. Then I restored the backup tar to md1. Since md1 has a new UUID, I updated /etc/mdadm/mdadm.conf and /etc/fstab on md1.

Then I used the procedure at https://help.ubuntu.com/community/Grub2 to chroot into md1 and rebuild Grub. It was apparently successful.

But when booting, Grub indicates that the root doesn't exist. In the busybox command line (ls -l /dev/disk/by-uuid), both md0 (boot) and md2 (degraded, but clean home) are shown, but not md1.

My question is this: how can I get Grub to wake up md1?

Blue Dude
December 1st, 2011, 09:36 PM
Wow, nobody wants to take a stab at it?

drs305
December 1st, 2011, 09:45 PM
I've seen the thread but don't use RAID and have never really learned much about it. However, I did write the chroot instructions you referred. RAID requires additional modules to be loaded which I am not sure would be automatically loaded during the chroot procedure. This might be part of the problem.

There are Grub gurus such as darkod on the forum who are familiar with Grub and RAID issues. You might try doing a search for his posts or hopefully someone familiar with RAID will read this thread.

Blue Dude
December 1st, 2011, 09:56 PM
Hi, thanks. The chroot procedure did refer to loading mdadm first, so I thought the RAID issue was covered. Perhaps in chroot off the Live CD, update-grub doesn't know that the new root will be a RAID array. Certainly Grub wakes up the other two RAID arrays, just not the root.

Blue Dude
December 2nd, 2011, 06:10 AM
I took a close look at the grub.cfg file from the backup (the install that booted to a RAID 5 system disk) and the one generated by the chroot after redoing the RAID, and they are identical except for the UUID's of the RAID disk, as you would expect. I'm beginning to suspect that there's nothing wrong with Grub or the way it's configured, but with the way the RAID disk itself was created. What would make it wake up with a mdadm --assemble --scan but not with the Grub RAID module?

darkod
December 2nd, 2011, 10:54 PM
Unfortunately, I'm not the guru people seem to think I am (thank you drs305). :)
Your problem seems really tangled up.
I have some questions, and that might help other people joining in too.

1. Your raid1 for md0, was it a standard raid1 of two disks? If yes, the disk that failed first and you removed it, was it part of md0 or the third disk?

Further on, you say another disk failed before the rebuild was done. Was it by any chance the other disk from md0? Am I making sense, could the both md0 disks be failed now thus leaving you without /boot?

2. The rebuild that failed, you seem to be under the impression that it failed only for md1 and that md2 runs fine degraded. Did you try to rebuild only md1 and not md2 too? Otherwise the failed rebuild would be for both of them.

3. When you removed the disk, did you actually removed it from the arrays first or not? From threads I have read here, it seems even with failed disk you need to remove it first from the array before actually physically removing it.

4. In any case, it would be nice to start of with the output of:

cat /proc/mdstat

to see what is running from the array, where and how. Meanwhile I'll ask for more help if possible. :)

rubylaser
December 2nd, 2011, 11:11 PM
One more simple question. Did you update your /etc/mdadm/mdadm.conf file to reflect the new UUID for md1? Otherwise it will not auto-assemble at boot.

Edit: I see that you did that in the first post. Did you run this after updating your mdadm.conf file?

update-initramfs -k all -u

Blue Dude
December 2nd, 2011, 11:16 PM
Hi darkod, thanks for the look.

1. md0 is a standard RAID 1. The failed drive was the third drive in the array, a spare, so it wasn't affected. md0 is clean and unaffected, so boot is OK.

2. I didn't bother to rebuild md2 since I didn't have a disk large enough to rebuild it to. It continues apparently unaffected, but as a degraded RAID 5.

I'll likely migrate all three drives to a new system with a beefed up power supply and newer SATA chipset to see if the third drive is truly dead or just starved. Sometimes it is seen during boot, sometimes not, but either way the live CD no longer recognizes it as a valid drive.

md1 originally consisted of sdb6, sdc6 and sdd6. sdd failed entirely so I removed it from the system. I added a partition on my Windows hdd, which became sda6. The rebuild failed because there was an additional problem with sdc6. I couldn't figure out how to correct it, so I zeroed out sdb6, sdc6, and sda6 for good measure with dd (yes, it was a nuke, but it seemed to get rid of the superblocks). I created a new RAID 5 on sdb6, sdc6 and sda6 which built successfully as md1. That's the drive I restored my backup to.

3. I don't remember anymore exactly what I did and in which order. I've been messing with this thing for most of the last week. I don't think I removed sdd from the various arrays before removing it from the system. But as I mentioned above, it was no longer being recognized by Ubuntu before I tried to remove it, so it may not matter.

4. I'll come up with a cat /proc/mdstat as soon as I can and report back here. IIRC, md0 and the new md1 are clean, and md2 is clean/degraded. If only grub agreed. :)

Blue Dude
December 2nd, 2011, 11:27 PM
Edit: I see that you did that in the first post. Did you run this after updating your mdadm.conf file?

update-initramfs -k all -u

I did not. That looks very promising. Should this be done while chroot-ed on the live CD?

rubylaser
December 3rd, 2011, 12:59 AM
Yes, you should give it a try chrooted in the LiveCD.

Blue Dude
December 3rd, 2011, 02:45 AM
4. In any case, it would be nice to start of with the output of:

cat /proc/mdstat

to see what is running from the array, where and how. Meanwhile I'll ask for more help if possible. :)

root@ubuntu:/home/ubuntu# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 sdb7[0] sdc7[1]
2891442688 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]

md1 : active raid5 sdb6[0] sda6[2] sdc6[1]
32965120 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdb1[0] sdc1[1]
975808 blocks [2/2] [UU]

unused devices: <none>

Blue Dude
December 3rd, 2011, 03:11 AM
Did you run this after updating your mdadm.conf file?

update-initramfs -k all -u

This worked. To make absolutely certain I didn't miss anything, I reran update-grub after doing this, and then grub-install. Was this overkill, or necessary?

Just to sum up for future sufferers, when changing a system RAID device, use the chroot procedure in https://help.ubuntu.com/community/Grub2 to update grub, but after chroot, ensure /etc/fstab and /etc/mdadm/mdadm.conf are correct, and update-initramfs. Then continue with update-grub, etc. Does this look correct?

Thanks very much to all for getting my system back up! :mrgreen: Any further trouble is officially unrelated, and probably deserves its own thread.

drs305
December 3rd, 2011, 03:16 AM
I wrote the original community doc you linked in the previous post. I really wish someone would complement it with information regarding RAID. It's been a couple of years and there still isn't any RAID help available in that document...

Glad you have your issue sorted out.

Blue Dude
December 3rd, 2011, 03:31 AM
But it was indeed helpful for RAID! You specifically mentioned making sure the mdadm.conf file was up to date, and when and how to start up all the arrays. In my case, the only piece that was missing involved updating the ramdisk image, and I don't think that was specific to RAID - the same issue would likely apply if changing the UUID of any system disk. And is it specifically a Grub issue at all?

drs305
December 3rd, 2011, 03:39 AM
And is it specifically a Grub issue at all?

Well, you have a point there. Updating the image is something that should happen outside Grub's control.

When you are ready, you can mark the thread SOLVED via the 'Thread Tools' link near the top right of the first post.

rubylaser
December 3rd, 2011, 04:44 AM
I'm glad that you got it working :)