PDA

View Full Version : [ubuntu] 11.10 Can't Boot without BOOT_DEGRADED=true



annagel
October 21st, 2011, 03:37 PM
I started last night attempting to do an upgrade to 11.10. The upgrade seemed to go well, but after completing I was unable to get the system to boot. Hooking the thing up to a minitor I noticed two issues: half the time I can't actually read the text output on the screen because it is coming out oddly scrambled and the other half of the time I am getting a message saying my RAID array is degraded do I want to mount degraded?

If I do nothing it drops to a prompt, if I say yes system boots fine and the array is not degraded. I ended up going back and doing a full fresh install of 11.10 desktop and server (server is there now) and I get the same issues.

I seem to be able to work around the problem by setting the BOOT_DEGRADED option to true, which with a headless system like this is probably not a bad idea anyway, but I have no idea why it thinks it is degraded durring boot. My root partition is not on the RAID, just data.

Any thoughts?

Thanks,
Andrew

drs305
October 21st, 2011, 03:45 PM
I don't know much about RAID, but if only data is on your RAID setup and Grub doesn't need it, I'm wondering what would happen if you put 'noauto' in the fstab options for that partition and then mount it later.

Users get the "Skip" mounting during boot when there is an irregularity in an fstab setting and removing it from fstab eliminates the message (as long as it isn't a system partition).

annagel
October 21st, 2011, 04:01 PM
Good advice, and actually something I do routinely now with a system upgrade is to drop all my arrays from fstab before an upgrade so I can confirm the md devices themselves are all getting set up right before I start trying to mount them. No help here though as they are not in fstab yet, the error seems to be coming from some attempt to set up the array, not mounting it.

Some additional info to: It appears I may have spoken too soon about booting with BOOT_DEGRADED, this seems to solve the problem when I only have 1 array, but when I hook both of them up it is a no go, I get an error saying it can't boot degraded.

Other additional info on the non-sensical display durring boot, it looks like this is only a problem if I don't manually select the default boot option, if at the selector screen I hit enter triggering the default option the display works perfect, if I let the selector time out the display goes all to hell until it gets through to the login prompt at which point I get the login and the tail end of the boot message output.

A second query, does anyone know where the mdadm messages get logged? All the messages about successfully mounting the array end up in dmesg and kernel but I can't find the error messages anywhere but on the actual display. On the display the are all formatted as mdadm: <message> not the time from start messages of the majority of the boot info.

drs305
October 21st, 2011, 04:50 PM
As far as the manual vs default booting, I looked at the grub.cfg file. It appears to me that if you manually select an entry before the timeout there are certain functions that are bypassed.

One of these is setting the linux_gfx_mode variable.

If I wanted to play with this to see if that might be what is causing the text readability issue, here is what I would do if I felt like investing some potentially unproductive time:

Copy /boot/grub/grub.cfg to /boot/grub/grub.bak.cfg
The reason: if everything goes bad on booting, you can go to the grub prompt and type "configfile (hdX,Y)/boot/grub/grub.bak.cfg" to get your original menu back.
Edit grub.cfg. This section appears immediately before the first menuentry in the 10_linux section. Add the following lines:


else
set linux_gfx_mode=text
fi

set linux_gfx_mode=text
# set linux_gfx_mode=keep

export linux_gfx_mode
if [ "$linux_gfx_mode" != "text" ]; then load_video; fi
menuentry 'Ubuntu, with Linux 3.0.0-12-generic' --class ubuntu --class gnu-linux --class gnu --class os {

Save the file. Do NOT run update-grub.
Reboot and see what happens to an autoselection boot.

If that doesn't fix it, uncomment the second added line, comment the first, and repeat.

You can revert to the original grub.cfg by running "sudo update-grub".

annagel
October 21st, 2011, 06:04 PM
That did solve my funky display problem, do you happen to know how I would make that change in a config file so it would survive an update-grub.

It also I believe got me a bit closer to figuring out what is causing my RAID issues, the RAID actually had mounted correctly the last few times, I had dropped the homehost and create statements from the mdadm.conf file and it seemed to do the trick, after this change though the issue came back. When I reverse the change, the arrays started failing to mount even degraded again. Obviously the two have no direct relationship (I hope anyway) but I am guessing there is some kind of timing issue at work here and the non-text mode takes just enough more time to give the disks the chance to respond and join the array correctly.

Now the question, if that is the case...anyone have any idea how i fix it?

drs305
October 21st, 2011, 07:48 PM
That did solve my funky display problem, do you happen to know how I would make that change in a config file so it would survive an update-grub.


I do, with a caveat. There very well could be a setting in /etc/default/grub that could set this, but I don't have the time at the moment to research this. That being said...

Open /etc/grub.d/10_linux. The section that needs editing follows. Since I don't know what section of the conditional (in italics) your system is triggering, I've taken the simple step of just inserting an override at the end of it.

It's also possible that it's just the 'load video' line at the end that is causing the problem. If it is, then just commenting that line and leaving everything else untouched may also accomplish the same thing. But I'll leave that test to you.

I also don't know which of the two lines I added you selected. I'll assume it was text, so that is what is in the example below. If it was 'keep', you will have to change it.

These lines occur in Grub 1.99 around line 160-185.



cat << EOF
if [ \${recordfail} != 1 ]; then
if [ -e \${prefix}/gfxblacklist.txt ]; then
if hwmatch \${prefix}/gfxblacklist.txt 3; then
if [ \${match} = 0 ]; then
set linux_gfx_mode=keep
else
set linux_gfx_mode=text
fi
else
set linux_gfx_mode=text
fi
else
set linux_gfx_mode=keep
fi
else
set linux_gfx_mode=text
fi
EOF
fi
cat << EOF
set linux_gfx_mode=text
export linux_gfx_mode
if [ "\$linux_gfx_mode" != "text" ]; then load_video; fi
EOF

Save the file and update-grub, and the line should be hard-coded just before the first menuentry in the 10_linux section of grub.cfg

annagel
October 21st, 2011, 08:01 PM
Thanks for the info, I will do some research and see if I can find a grub option as well since I need to wait for my other issues to be solved before I can put this fix in place. I found a second thread which seems to deal with a similar issue both of which I believe are tied to this bug (https://bugs.launchpad.net/ubuntu/+source/dmraid/+bug/872220). I may just roll back to 11.04 for the time being and come back when that bug is resolved as the only working solution right now is to have the computer hooked up to the TV in my living room which is not ideal.

drs305
October 21st, 2011, 08:11 PM
I read your bug report:


I have been able to tie both the check failure and the failure to mount degraded to whether the linux_gfx_mode parameter is set to "keep" or "text". It would seem like this has nothing to do with the root issue but I am guessing that the graphics mode is slowing the check down just enough for the drives to report in and pass the test.

Two comments. What you surmise might touch on what I mentioned in my last post. While the linux_gfx_mode=text, perhaps it is the 'load video' in the next line that is really delaying things (based on the linux_gfx_mode setting).

If that is the case, would a 'rootdelay=X' in the kernel options help? That would be added in the GRUB_CMDLINE_LINUX_DEFAULT= line of /etc/default/grub. (Rootdelay actually delays mounting the root filesystem, which you said is not on RAID, but perhaps it would slow the rest down as well.

annagel
October 22nd, 2011, 12:51 AM
If it is that particular function taking a while could I add a sleep 5 to the script right then for he same effect? This way I could target it at exactly the spot the old function was executing from.

drs305
October 22nd, 2011, 01:27 AM
I don't know. I believe the text readability is just a matter of forcing the text mode of the linux_gfx_mode.

As for timing, I've tested it and inserting 'sleep' does work in grub.cfg. Everywhere I inserted it prior to the first menuentry delayed the appearance of the entire menu being displayed.

I don't know enough about RAID to know when drives are mounted and what a delay within the grub.cfg prior to selecting an entry would do, but I suspect not much. I think you'll have more success discussing the RAID issue in the bug reports or with someone else familiar with RAID.

MAFoElffen
October 22nd, 2011, 02:01 AM
I don't know. I believe the text readability is just a matter of forcing the text mode of the linux_gfx_mode.

As for timing, I've tested it and inserting 'sleep' does work in grub.cfg. Everywhere I inserted it prior to the first menuentry delayed the appearance of the entire menu being displayed.

I don't know enough about RAID to know when drives are mounted and what a delay within the grub.cfg prior to selecting an entry would do, but I suspect not much. I think you'll have more success discussing the RAID issue in the bug reports or with someone else familiar with RAID.
Just a question-
Does adding a line to /etc/default/grub/ saying


GRUB_GFXPAYLOAD_LINUX=text
Effect that? If it did, then that file doesn't change automatically in "system" updates...

Just thinking...

drs305
October 22nd, 2011, 02:49 AM
Just a question-
Does adding a line to /etc/default/grub/ saying


GRUB_GFXPAYLOAD_LINUX=text
Effect that? If it did, then that file doesn't change automatically in "system" updates...

Just thinking...

:-)

I think I mentioned that possibility but there is a long conditional just before the menuentry in the 10_linux section that looks at gfxblacklist.txt and I have not really studied that section nor have knowledge about the file.

The 'export linux_gfx_mode' occurs immediately before the menuentry, so with a lack of time to go through each line of the scripts to find out how it interacts with the GRUB_GFXPAYLOAD_LINUX setting (and the gfxblacklist.txt) this was the quickest solution I could come up with.

One of my notes from the thread was to research if, where, and when the GRUB_GFXPAYLOAD_LINUX setting interacts with (or becomes) the linux_gfx_mode, since the former isn't mentioned anywhere in grub.cfg

annagel
October 22nd, 2011, 03:44 AM
Thanks for the information, I will give it a try first thing in the morning, unfortunately in my haste to get a system back up (once I am in a bad cycle this requires pulling the drives from one array or the other, thankfully they are in hot swappable bay) I booted with one drive still disconnected and now I have a real degraded array that needs to be rebuilt....so that obviously gets the top priority.

annagel
October 22nd, 2011, 01:51 PM
The

GRUB_GFXPAYLOAD_LINUX=text
option did the trick, I get a boot when I can easily see everything on the screen and I now have a way to fairly reliably trigger the RAID issue.

Thanks to both of you for all your help!

gadnex
October 25th, 2011, 09:36 AM
I believe a solution to this issue may have been found in this (http://ubuntuforums.org/showpost.php?p=11388915&postcount=18) thread.

It seems there is a race condition that occurs while booting.

Hopefully this is something that can be resolved in a future Ubuntu patch or version.