PDA

View Full Version : [ubuntu] [64bit 12.04] Grub hangs at purple screen on boot



v4169sgr
June 5th, 2012, 09:09 AM
Hi All,

This problem is a hang on purple screen on boot.

The system is a fresh 64 bit alternate install: here is the /etc/fstab


# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sdc1 during installation
UUID=604156ff-c419-4062-93ba-0e7cf892316a / ext4 errors=remount-ro 0 1
# /home was on /dev/md1 during installation
UUID=5df23755-2590-4d0e-8b94-9a9253dbb723 /home ext4 defaults 0 2
# /oldboot was on /dev/sdb1 during installation
UUID=1d4c5caf-eb4c-4f6e-98b3-52c7654e8fe5 /oldboot ext4 defaults 0 2
# /oldroot was on /dev/md0 during installation
UUID=bb9fe603-4522-448c-822d-93a881acbc8f /oldroot ext4 defaults 0 2
# swap was on /dev/sda1 during installation
UUID=b8ee76ff-a1df-45ce-961e-bf53c9a37992 none swap sw 0 0


Weird thing is that during the install I followed the live CD default to put grub on /dev/sda. This of course did not work, so I had to reboot onto the live CD in rescue mode and put grub on /dev/sdc, which is a SSD. There are two RAID0 partitions on /dev/sda and /dev/sdb: these are not bootable and contain the previous 10.04 system [for reference] and /home.

The problem only seems to occur after a cold-start [not a reboot from the menu], and then only sometimes. The system is only recoverable by holding down the power button to force a power-down, then rebooting, which lands me in the grub menu; on selecting the first [default] option, boot proceeds normally, though sometimes with a very 'scary' warning about degraded RAID, which the user is required to take responsibility by pressing 'y' then enter. I think the latter is a red herring though as everything is perfectly normal after that.

I've followed some of the suggestions under:

http://ubuntuforums.org/showthread.php?t=1743535

including:
- setting noacpi or nomodeset [didn't work]
- setting GRUB_CMDLINE_LINUX_DEFAULT="text" [no effect :(]
- uncommenting GRUB_TERMINAL=console [booted but dumped me in the text console - lucky I knew how to recover!]
- setting GRUB_GFXMODE=1024x768 [now tries to give me a nicer splash with the dots - have checked to see that this is a supported video mode using grub vbeinfo - but ultimately does nothing for the problem]

Here is my current grub config:



~$ cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
#GRUB_CMDLINE_LINUX_DEFAULT="text"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
GRUB_GFXMODE=1024x768

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"


One possibility is that somehow the system is handing off sometimes to the grub install on /dev/hda, but I don't think this can be true, because then surely I would just be seeing the blinking cursor as I did before rescuing the newly-installed system.

This is the output from sudo hwinfo:



~$ sudo hwinfo --framebuffer
process 8800: arguments to dbus_move_error() were incorrect, assertion "(dest) == NULL || !dbus_error_is_set ((dest))" failed in file ../../dbus/dbus-errors.c line 282.
This is normally a bug in some application using the D-Bus library.
libhal.c 3483 : Error unsubscribing to signals, error=The name org.freedesktop.Hal was not provided by any .service files
02: None 00.0: 11001 VESA Framebuffer
[Created at bios.464]
Unique ID: rdCR.efxRnBcYXs5
Hardware Class: framebuffer
Model: "NVIDIA GF108 Board - 1071v0p1"
Vendor: "NVIDIA Corporation"
Device: "GF108 Board - 1071v0p1"
SubVendor: "NVIDIA"
SubDevice:
Revision: "Chip Rev"
Memory Size: 14 MB
Memory Range: 0xf1000000-0xf1dfffff (rw)
Mode 0x0300: 640x400 (+640), 8 bits
Mode 0x0301: 640x480 (+640), 8 bits
Mode 0x0303: 800x600 (+800), 8 bits
Mode 0x0305: 1024x768 (+1024), 8 bits
Mode 0x0307: 1280x1024 (+1280), 8 bits
Mode 0x030e: 320x200 (+640), 16 bits
Mode 0x030f: 320x200 (+1280), 24 bits
Mode 0x0311: 640x480 (+1280), 16 bits
Mode 0x0312: 640x480 (+2560), 24 bits
Mode 0x0314: 800x600 (+1600), 16 bits
Mode 0x0315: 800x600 (+3200), 24 bits
Mode 0x0317: 1024x768 (+2048), 16 bits
Mode 0x0318: 1024x768 (+4096), 24 bits
Mode 0x031a: 1280x1024 (+2560), 16 bits
Mode 0x031b: 1280x1024 (+5120), 24 bits
Mode 0x0330: 320x200 (+320), 8 bits
Mode 0x0331: 320x400 (+320), 8 bits
Mode 0x0332: 320x400 (+640), 16 bits
Mode 0x0333: 320x400 (+1280), 24 bits
Mode 0x0334: 320x240 (+320), 8 bits
Mode 0x0335: 320x240 (+640), 16 bits
Mode 0x0336: 320x240 (+1280), 24 bits
Mode 0x033d: 640x400 (+1280), 16 bits
Mode 0x033e: 640x400 (+2560), 24 bits
Mode 0x0345: 1600x1200 (+1600), 8 bits
Mode 0x0346: 1600x1200 (+3200), 16 bits
Mode 0x034a: 1600x1200 (+6400), 24 bits
Mode 0x0360: 1280x800 (+1280), 8 bits
Mode 0x0361: 1280x800 (+5120), 24 bits
Config Status: cfg=new, avail=yes, need=no, active=unknown

I've noticed this and similar topics come up quite frequently on gthe board, so hope someone has some pointers that might be relevant to this case.

Your thoughts much appreciated!!!

v4169sgr
June 5th, 2012, 09:24 AM
EDIT: happens on reboots too - I cursed myself by saying 'now I know this will not happen when ....' :(

v4169sgr
June 5th, 2012, 06:24 PM
Any input appreciated - thanks :)

bogan
June 5th, 2012, 07:07 PM
HI!, v4169sgr,
What Cpu do you have ? Is it with integrated graphics??

Please Post:
lspci -nnk | grep -iA2 VGAYou Posted:
though as everything is perfectly normal after that.Does that mean you get a normal login screen and can log-in to ubuntu 3d & 2d as well as Gnome and Gnome Classic and have full Gui screen functions?

A search on nvidia drivers does not show the GF108, but a general search shows the GF108 as the GPU used in the GT430 video cards, for which the 295.53 driver is the recommended latest version.

What video driver are you using??

If you do not know,
hwinfo --gfxcard should tell you both what is available and which active.
If it is an nvidia driver the version will show from:
cat /sys/module/nvidia/version Alternatively you can get it from Synaptic Package Manager, or if nvidia-current from:
sudo apt-cache policy nvidia-current [ Edit: last bit added.]

Chao!, bogan.

v4169sgr
June 5th, 2012, 09:09 PM
Thanks a lot for respoding, Bogan! :)

Yes, after the grub menu and the scary warning, everything is perfectly normal and I have full Unity goodness :)

Now, your other questions: you are really well-informed ...


$ lspci -nnk | grep -iA2 VGA01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 430] [10de:0de1] (rev a1)
Subsystem: ASUSTeK Computer Inc. Device [1043:83b8]
Kernel driver in use: nvidia



$ cat /sys/module/nvidia/version
295.40

I believe I have posted the output from 'hwinfo' above.

OK, so I am running v 295.40 instead of 295.53. Allow me to be a complete idiot, please, and ask how I am going to change that? Are you saying that moving from 295.40 to 295.53 is likely to solve these problems?

The 'Additional Drivers' app in 'System Settings' tells me I am running 'version current'. I see there's another option for 'post-release updates'. Is this the one I should be running?

Thanks again!

oldfred
June 5th, 2012, 09:51 PM
I do not like to directly download nVidia drivers. Years ago I had huge issues, so I just use the Ubuntu ones. But since they now offer to download a somewhat newer version I do install that.

My version:
fred@fred-Precise:~$ cat /sys/module/nvidia/version
295.49

Understand, if not using the standard version you may be testing something.

v4169sgr
June 5th, 2012, 10:53 PM
Hi oldfred,

I've just upgraded to the recommended post-install update: 295.49, just like you. And, like you, I am averse to straying off the recommended path.

Still have the same issue, though :(

oldfred
June 6th, 2012, 02:04 AM
I think I am booting in the minimal mode, I am not running Unity and just running gnome-panel. But I cannot really tell what the difference is.

v4169sgr
June 6th, 2012, 07:19 AM
I've just removed nvidia drivers completely, as an experiment, but still have the same issue - every boot :(

So I believe video drivers can be discounted.

Any suggestions gratefully apppreciated!

v4169sgr
June 6th, 2012, 07:30 AM
I've also just tried removing my custom desktop image that was being set in the login screen, so that I did't depend on this file being available, but that makes no difference either.

I think it is something far more fundamental.

Suggestions welcome! :)

Megaptera
June 6th, 2012, 07:38 AM
I did this twice on freash installs of 12.04 'cos I too was getting that hanging purple screen & both times I got the proper dotted lines loader - no idea why it worked but it did.

In terminal entered gksudo gedit /etc/default/grub
That open editor and I changed GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"

saved and exited editor then in terminal ran sudo update-grub

Then re-booted twice then
In terminal entered gksudo gedit /etc/default/grub
That open editor and I changed GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset" to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
saved and exited editor then in terminal ran sudo update-grub

Then re-booted twice.

As I said, no idea why but it worked twice 'cos I did a re-install to see if was a fluke or not.

v4169sgr
June 6th, 2012, 08:03 AM
Thanks Megaptera for your reply :)

Do you think your suggested method would work as well if I rebooted more than twice in between [only, as you know, my first reboot always fails]?

Megaptera
June 6th, 2012, 08:20 AM
Thanks Megaptera for your reply :)

Do you think your suggested method would work as well if I rebooted more than twice in between [only, as you know, my first reboot always fails]?

Why not give it a try? The changes I made are easily un-do-able :p

v4169sgr
June 6th, 2012, 08:45 AM
Looked promising on the first restart, but did not make it past the first cold boot [from power-off].

Sorry, but that did not work for me :(

Suggestions always welcome! :)

v4169sgr
June 6th, 2012, 09:33 AM
In further attempts to understand the problem, I entered the grub menu on boot, and temporarily set in edit mode "nosplash --verbose text" instead of the default "quiet splash $vthandoff".

The following syslog excerpt is from a successful boot [from the menu and after passing the scary warning]. For some reason I don't see syslog entries from failed boots.


Jun 6 09:12:00 ammscott kernel: [ 1.920039] md: linear personality registered for level -1
Jun 6 09:12:00 ammscott kernel: [ 1.922935] md: multipath personality registered for level -4
Jun 6 09:12:00 ammscott kernel: [ 1.926942] md: raid0 personality registered for level 0
Jun 6 09:12:00 ammscott kernel: [ 1.930729] md: raid1 personality registered for level 1
Jun 6 09:12:00 ammscott kernel: [ 1.933379] async_tx: api initialized (async)
Jun 6 09:12:00 ammscott kernel: [ 1.933838] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Jun 6 09:12:00 ammscott kernel: [ 1.933854] r8169 0000:04:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Jun 6 09:12:00 ammscott kernel: [ 1.933888] r8169 0000:04:00.0: setting latency timer to 64
Jun 6 09:12:00 ammscott kernel: [ 1.934066] r8169 0000:04:00.0: irq 42 for MSI/MSI-X
Jun 6 09:12:00 ammscott kernel: [ 1.934291] r8169 0000:04:00.0: eth0: RTL8168f/8111f at 0xffffc90000c76000, c8:60:00:ce:32:df, XID 08000800 IRQ 42
Jun 6 09:12:00 ammscott kernel: [ 1.934293] r8169 0000:04:00.0: eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Jun 6 09:12:00 ammscott kernel: [ 1.997670] raid6: int64x1 3483 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.049638] usb 3-4: new low-speed USB device number 2 using xhci_hcd
Jun 6 09:12:00 ammscott kernel: [ 2.065619] raid6: int64x2 4348 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.070120] usb 3-4: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
Jun 6 09:12:00 ammscott kernel: [ 2.073396] input: Logitech Optical USB Mouse as /devices/pci0000:00/0000:00:14.0/usb3/3-4/3-4:1.0/input/input3
Jun 6 09:12:00 ammscott kernel: [ 2.073452] generic-usb 0003:046D:C016.0001: input,hidraw0: USB HID v1.10 Mouse [Logitech Optical USB Mouse] on usb-0000:00:14.0-4/input0
Jun 6 09:12:00 ammscott kernel: [ 2.073460] usbcore: registered new interface driver usbhid
Jun 6 09:12:00 ammscott kernel: [ 2.073461] usbhid: USB HID core driver
Jun 6 09:12:00 ammscott kernel: [ 2.091139] md: bind<sdb6>
Jun 6 09:12:00 ammscott kernel: [ 2.092910] md: bind<sdb5>
Jun 6 09:12:00 ammscott kernel: [ 2.133537] raid6: int64x4 3535 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.157661] usb 2-1.7: new high-speed USB device number 3 using ehci_hcd
Jun 6 09:12:00 ammscott kernel: [ 2.201505] raid6: int64x8 2971 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.250105] hub 2-1.7:1.0: USB hub found
Jun 6 09:12:00 ammscott kernel: [ 2.250208] hub 2-1.7:1.0: 4 ports detected
Jun 6 09:12:00 ammscott kernel: [ 2.269403] raid6: sse2x1 9434 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.337334] raid6: sse2x2 11726 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.405271] raid6: sse2x4 13594 MB/s
Jun 6 09:12:00 ammscott kernel: [ 2.405272] raid6: using algorithm sse2x4 (13594 MB/s)
Jun 6 09:12:00 ammscott kernel: [ 2.405794] xor: automatically using best checksumming function: generic_sse
Jun 6 09:12:00 ammscott kernel: [ 2.425249] generic_sse: 15378.000 MB/sec
Jun 6 09:12:00 ammscott kernel: [ 2.425250] xor: using function: generic_sse (15378.000 MB/sec)
Jun 6 09:12:00 ammscott kernel: [ 2.425868] md: raid6 personality registered for level 6
Jun 6 09:12:00 ammscott kernel: [ 2.425869] md: raid5 personality registered for level 5
Jun 6 09:12:00 ammscott kernel: [ 2.425870] md: raid4 personality registered for level 4
Jun 6 09:12:00 ammscott kernel: [ 2.428917] md: raid10 personality registered for level 10
Jun 6 09:12:00 ammscott kernel: [ 2.479372] md: bind<sda5>
Jun 6 09:12:00 ammscott kernel: [ 2.480479] bio: create slab <bio-1> at 1
Jun 6 09:12:00 ammscott kernel: [ 2.480485] md/raid0:md0: md_size is 39059200 sectors.
Jun 6 09:12:00 ammscott kernel: [ 2.480486] md: RAID0 configuration for md0 - 1 zone
Jun 6 09:12:00 ammscott kernel: [ 2.480487] md: zone0=[sdb5/sda5]
Jun 6 09:12:00 ammscott kernel: [ 2.480489] zone-offset= 0KB, device-offset= 0KB, size= 19529600KB
Jun 6 09:12:00 ammscott kernel: [ 2.480490]
Jun 6 09:12:00 ammscott kernel: [ 2.480495] md0: detected capacity change from 0 to 19998310400
Jun 6 09:12:00 ammscott kernel: [ 2.481505] md0: unknown partition table
Jun 6 09:12:00 ammscott kernel: [ 2.566730] md: bind<sda6>
Jun 6 09:12:00 ammscott kernel: [ 2.567915] md/raid0:md1: md_size is 3860168448 sectors.
Jun 6 09:12:00 ammscott kernel: [ 2.567916] md: RAID0 configuration for md1 - 1 zone
Jun 6 09:12:00 ammscott kernel: [ 2.567917] md: zone0=[sdb6/sda6]
Jun 6 09:12:00 ammscott kernel: [ 2.567919] zone-offset= 0KB, device-offset= 0KB, size=1930084224KB
Jun 6 09:12:00 ammscott kernel: [ 2.567920]
Jun 6 09:12:00 ammscott kernel: [ 2.567926] md1: detected capacity change from 0 to 1976406245376
Jun 6 09:12:00 ammscott kernel: [ 2.569249] md1: unknown partition table
Jun 6 09:12:00 ammscott kernel: [ 5.992324] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)
Jun 6 09:12:00 ammscott kernel: [ 8.707576] ADDRCONF(NETDEV_UP): eth0: link is not ready
Jun 6 09:12:00 ammscott kernel: [ 8.711125] Adding 1951740k swap on /dev/sda1. Priority:-1 extents:1 across:1951740k
Jun 6 09:12:00 ammscott kernel: [ 8.722690] lp: driver loaded but no devices found
Jun 6 09:12:00 ammscott kernel: [ 8.725950] usbcore: registered new interface driver usblp
Jun 6 09:12:00 ammscott kernel: [ 8.730970] mei: module is from the staging directory, the quality is unknown, you have been warned.
Jun 6 09:12:00 ammscott kernel: [ 8.732627] type=1400 audit(1338970315.965:2): apparmor="STATUS" operation="profile_load" name="/sbin/dhclient" pid=480 comm="apparmor_parser"
Jun 6 09:12:00 ammscott kernel: [ 8.732861] type=1400 audit(1338970315.965:3): apparmor="STATUS" operation="profile_load" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=480 comm="apparmor_parser"
Jun 6 09:12:00 ammscott kernel: [ 8.732993] type=1400 audit(1338970315.965:4): apparmor="STATUS" operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" pid=480 comm="apparmor_parser"
Jun 6 09:12:00 ammscott kernel: [ 8.734037] mei 0000:00:16.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
Jun 6 09:12:00 ammscott kernel: [ 8.734042] mei 0000:00:16.0: setting latency timer to 64
Jun 6 09:12:00 ammscott kernel: [ 8.734086] mei 0000:00:16.0: irq 43 for MSI/MSI-X
Jun 6 09:12:00 ammscott kernel: [ 8.738983] wmi: Mapper loaded
Jun 6 09:12:00 ammscott kernel: [ 8.749614] [drm] Initialized drm 1.1.0 20060810
Jun 6 09:12:00 ammscott kernel: [ 8.754486] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Jun 6 09:12:00 ammscott kernel: [ 8.767433] VGA switcheroo: detected Optimus DSM method \ handle
Jun 6 09:12:00 ammscott kernel: [ 8.767454] nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Jun 6 09:12:00 ammscott kernel: [ 8.767457] nouveau 0000:01:00.0: setting latency timer to 64
Jun 6 09:12:00 ammscott kernel: [ 8.769221] [drm] nouveau 0000:01:00.0: Detected an NVc0 generation card (0x0c1080a1)
Jun 6 09:12:00 ammscott kernel: [ 8.775573] [drm] nouveau 0000:01:00.0: Attempting to load BIOS image from PRAMIN
Jun 6 09:12:00 ammscott kernel: [ 8.830617] asus_wmi: ASUS WMI generic driver loaded
Jun 6 09:12:00 ammscott kernel: [ 8.831289] asus_wmi: Initialization: 0x0
Jun 6 09:12:00 ammscott kernel: [ 8.831303] asus_wmi: BIOS WMI version: 0.9
Jun 6 09:12:00 ammscott kernel: [ 8.831326] asus_wmi: SFUN value: 0x0

On the faied boot I monitored, I saw the following [approximately, from memory]


Jun 6 09:12:00 ammscott kernel: [ 2.480486] md: RAID0 configuration for md0 - 1 zone
Jun 6 09:12:00 ammscott kernel: [ 2.480487] md: zone0=[sdb5/sda5]
Jun 6 09:12:00 ammscott kernel: [ 2.480489] zone-offset= 0KB, device-offset= 0KB, size= 19529600KB
Jun 6 09:12:00 ammscott kernel: [ 2.480490]
Jun 6 09:12:00 ammscott kernel: [ 2.480495] md0: detected capacity change from 0 to 19998310400
Jun 6 09:12:00 ammscott kernel: [ 2.481505] md0: unknown partition table
Jun 6 09:12:00 ammscott kernel: [ 2.566730] md: bind<sda6>
Jun 6 09:12:00 ammscott kernel: [ 2.567915] md/raid0:md1: md_size is 3860168448 sectors.
Jun 6 09:12:00 ammscott kernel: [ 2.567916] md: RAID0 configuration for md1 - 1 zone
Jun 6 09:12:00 ammscott kernel: [ 2.567917] md: zone0=[sdb6/sda6]
Jun 6 09:12:00 ammscott kernel: [ 2.567919] zone-offset= 0KB, device-offset= 0KB, size=1930084224KB
Jun 6 09:12:00 ammscott kernel: [ 2.567920]
Jun 6 09:12:00 ammscott kernel: [ 2.567926] md1: detected capacity change from 0 to 1976406245376
Jun 6 09:12:00 ammscott kernel: [ 2.569249] md1: unknown partition table

followed by



Jun 6 09:12:00 ammscott kernel: [ 2.157661] usb 2-1.7: new high-speed USB device number 3 using ehci_hcd
Jun 6 09:12:00 ammscott kernel: [ 2.250105] hub 2-1.7:1.0: USB hub found
Jun 6 09:12:00 ammscott kernel: [ 2.250208] hub 2-1.7:1.0: 4 ports detected

Those were the last lines seen; the reboot process then timed out.

I tried


sudo dpkg-reconfigure mdadm


to boot into a degraded raid, but this does not solve the problem.

Here are the raid details


/dev/md0:
Version : 0.90
Creation Time : Thu Jun 3 10:37:07 2010
Raid Level : raid0
Array Size : 19529600 (18.62 GiB 20.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Thu Jun 3 10:37:07 2010
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Chunk Size : 64K

UUID : fda88cf5:7dbf771e:84aa465c:f0e60ba7
Events : 0.1

Number Major Minor RaidDevice State
0 8 21 0 active sync /dev/sdb5
1 8 5 1 active sync /dev/sda5


/dev/md1:
Version : 0.90
Creation Time : Thu Jun 3 10:37:27 2010
Raid Level : raid0
Array Size : 1930084224 (1840.67 GiB 1976.41 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Thu Jun 3 10:37:27 2010
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Chunk Size : 64K

UUID : 3e44c0c4:ce2cd4af:66d5c71f:8b1bacd3
Events : 0.1

Number Major Minor RaidDevice State
0 8 22 0 active sync /dev/sdb6
1 8 6 1 active sync /dev/sda6



Everything looks fine to me - certainly there are no failed devices!

At this stage I really don't know what I am looking at, but three things strike me as being a bit odd [maybe from ignorance, I don't know]
* Why should the raid devices show up as having 'unknown partition tables'?
* The last lines shown were from the USB hub - any significance?
* sdc1 [root] takes three seconds to mount, then there is another three seconds before eth0 comes up and swap gets added. Is this unusual?

Please please help me out here - I've spent HOURS on this problem and really need to be sorting out the rest of the system :(

dino99
June 6th, 2012, 10:31 AM
if this issue concern raid installation, then you should find warnings/errors logged either inside .xsession-errors or /var/log/

you also can purge plymouth via synaptic to eliminate some cases, and reinstall it later when this problem will be identify

v4169sgr
June 6th, 2012, 11:24 AM
Thank you dino99,

I don't see anything relevant - as above, I see no logs written from failed boots.

Regarding purging plymouth, would that cause problems for booting? What happens if I purge plymouth from my system then find I am unable to boot at all?

oldfred
June 6th, 2012, 04:47 PM
I see mention of optimus, do you have that.

nVidia Optimus and Ubuntu explained
http://ubuntuforums.org/showthread.php?t=1657660

Could it be that RAID drives just have not loaded every time so sometimes they work and sometimes not? Some have had hard drives be slow loading and have to add startup scripts so they mount well after everything else. Or SSD is too fast?

v4169sgr
June 6th, 2012, 05:32 PM
Hi oldfred,

Thanks again for your reply - I appreciate people's efforts to help :)

I'm not sure at all that there is a problem with the display during boot.

And indeed I am working on the theory that the disks take their own sweet time to become ready - not sure if the issue is with the SSD or the RAIDs. Currently I'd really appreciate help getting a Plymouth script ready that would delay boot while waiting on user input [nothing scary like grub menus, dire RAID warnings etc, the users won't wear that!].

Please see http://ubuntuforums.org/showthread.php?t=1998168.

Thanks all in advance, especially if you believe I am headed up the wrong path / missing a short cut etc :)

bogan
June 6th, 2012, 06:05 PM
Hi!, v4169sgr,

The 295.40 driver you were using is the bugged version all the nvidia furore was - and still is - about. The latest version 295.53 seems to have cured those problems for most people.

But I see that other people have responded, and you have moved on to other possible causes, and have removed the nvidia driver.

It would seem your problems are rather more complex than your first Posts suggested, so the best of luck - possibly the 295.40 driver was producing errors that obscured the real faults, now revealed.

I do not know anything about Raid set-ups, but at least my Post activated more knowledgeable assistance.

Chao!, bogan.

v4169sgr
June 6th, 2012, 06:53 PM
Thank you Bogan,

At the moment I am on v 295.49 of the NVidia driver, which is the latest one that I understand Ubuntu currently recommends. As and when Ubuntu releases the 295.53 version of the driver, I will be taking that up.

In the meantime, I am hoping to see if I can workaround the problem on the hypothesis that it is some kind of race condition [and I think it is, else how could booting from the grub menu work every time, and straight booting not?], but need a bit of help with a Plymouth script to delay the boot process. Not ideal, but best to reliably boot :)

Hint, hint, peeps !!! :)

v4169sgr
June 6th, 2012, 11:15 PM
OK folks, this is it: SOLVED!

It was not the video drivers. Not plymouth either. Not even grub.

It was initramfs -- in particular the way it deals with RAID arrays.

The post by dangriffin on this thread from Oct 2011

http://ubuntuforums.org/showthread.php?t=1861516&page=2

has the answer which worked for me:


I fixed this problem on my hardware by adding a 'udevadm settle' in the following file:

/usr/share/initramfs-tools/scripts/mdadm-functions

In there, look for the following function:


degraded_arrays()
{
mdadm --misc --scan --detail --test >/dev/null 2>&1
return $((! $?))
}


and change it to:


degraded_arrays()
{
udevadm settle
mdadm --misc --scan --detail --test >/dev/null 2>&1
return $((! $?))
}

then do a:


sudo update-initramfs -u


and reboot.

This line makes all current udev rules being processed complete before the health check is made on the array. By then the array should have had chance to correctly assemble.


Honestly, this guy deserves AT LEAST a medal, more like a raise :P