PDA

View Full Version : February 2018 kernel update borked systems running nVidia graphics, how to recover?



Mythological
February 22nd, 2018, 10:04 PM
Several people whose systems have nVidia graphics are reporting that their systems were rendered unusable by yesterday's kernel update - there are several bug reports at https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/1750937

My reason for opening this thread is to ask if anyone has had success in recovering from this, and could explain how to do so in such a way that a non-Linux-geek could understand? I can run apt-get, etc. at the command line if I know what to type. I'm afraid that my previous attempts to repair this problem may have really screwed up the system, but right now I can't even get into it except via ssh. So if anyone knows how to get out of this hole, please share, it would be very much appreciated.

wbmilleriii
February 22nd, 2018, 10:34 PM
Hi Mythological, continuing our discussion from the bug tracker. If you can reboot your machine, and then hold down the Left Shift key while it's rebooting, you should get the grub menu, that allows you to boot into prior kernels. However, your efforts to fix the system sound like they may have left you with no working video drivers. The first thing to try is booting into the old kernel (the -112) kernel. If that works and your display is OK, you are done.

Mythological
February 22nd, 2018, 10:45 PM
Oh, okay, thanks. Much appreciated. I'm not in a position to try this right now, but I hope this is all I need to do to recover.

One thing I did discover, just by accident really, is that when I attempted to install the nVidia driver from the nVidia site it left a file in /etc/modprobe.d called nvidia-installer-disable-nouveau.conf, which contains this:

# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0

Maybe THAT is why noveau isn't working. Who knows at this point. I also found this page, "How to fix NVIDIA driver failure on Ubuntu (https://codeyarns.com/2013/02/07/how-to-fix-nvidia-driver-failure-on-ubuntu/)" but it was written back in 2013 so not sure if it is still relevant.

wbmilleriii
February 22nd, 2018, 11:05 PM
If you can log in but not to a graphical desktop, this is the best way I've found of installing nvidia drivers

sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-375 nvidia-settings

(note 375 is a transitional package for 384, so nvidia-384 is also installed)

But I don't know anything about the blacklisting part, I hope that doesn't cause problems.

Mythological
February 23rd, 2018, 12:04 AM
Actually it turned out that all I had to do was this:
Boot and hold the shift key and select the old kernel (the -112 kernel)
I still could not log in at the desktop but I could ssh in, so at that point I did
sudo apt-get install nvidia-current
Rebooted the system and it came up, but it was running a fairly old version of the nVidia driver, so updated that (from the now-working desktop) and rebooted again. It came back up fine.
The blacklist thing didn't seem to matter.
The only thing is I REALLY hope they fix this soon as I have no desire to keep booting into an old kernel. And since this apparently is affecting both Ubuntu 14.04 and Ubuntu 16.04 users (and in all likelihood users of several other recent versions) I would sure hope there is some priority placed on fixing this. I read somewhere that nVidia graphics hardware is not the only thing affected by this, that basically anything that relies on dkms (if I am saying that right) has been broken by this update.

I did all that before I saw your previous post. Thank you for your help!

wbmilleriii
February 23rd, 2018, 12:18 AM
I'm glad you are up and running.

I just had a similar problem with nvidia drivers on my test 18.04 system, which got a kernel update today. Looks like there are some serious problems with nividia drivers and the kernel fixes for the spectre/meltdown vulnerabilities. The good news is that there are lots of nvidia users so this will get ironed out before too long.

wbmilleriii
February 23rd, 2018, 11:45 PM
I can't answer your dkms question, but I fixed the problem by uninstalling and reinstalling the -116 kernel. This forced a recompile of the drivers with the new gcc version.

To do this, I used synaptic to remove the 4 -116 kernel files. synaptic then said it had to remove linux-generic-lts-xenial and 2 other metapackages, which sounds scary but it it's OK. Remove the kernel with synaptic, then reinstall linux-generic-lts-xenial. This will pull in all the other kernel files.

I also had to reinstall virtualbox, it also uses a kernel driver.

LinuxGuy39
February 24th, 2018, 07:55 AM
I can't answer your dkms question, but I fixed the problem by uninstalling and reinstalling the -116 kernel. This forced a recompile of the drivers with the new gcc version.

To do this, I used synaptic to remove the 4 -116 kernel files. synaptic then said it had to remove linux-generic-lts-xenial and 2 other metapackages, which sounds scary but it it's OK. Remove the kernel with synaptic, then reinstall linux-generic-lts-xenial. This will pull in all the other kernel files.

I also had to reinstall virtualbox, it also uses a kernel driver.

When you reinstalled linux-generic-lts-xenial did you do it from in synaptic too and did you reboot between removing the 4 -116 kernel files and reinstalling linux-generic-lts-xenial?

wbmilleriii
February 24th, 2018, 03:01 PM
Yes, I reinstalled it in synaptic immediately after I removed it. I did not reboot. The point of this whole exercise is to force the nvidia driver kernel modules to get recompiled, and you can watch that happen as the new kernel gets installed if you hit "details" on the box that pops up while synaptic is working.

Mythological
February 24th, 2018, 07:57 PM
Yes, I reinstalled it in synaptic immediately after I removed it. I did not reboot. The point of this whole exercise is to force the nvidia driver kernel modules to get recompiled, and you can watch that happen as the new kernel gets installed if you hit "details" on the box that pops up while synaptic is working.

I tried doing this I never saw anything in the details, however when I looked at the log file I saw this...

dkms: removing: bbswitch 0.7 (4.4.0-116-generic) (x86_64)

-------- Uninstall Beginning --------
Module: bbswitch
Version: 0.7
Kernel: 4.4.0-116-generic (x86_64)
-------------------------------------

Status: Before uninstall, this module version was ACTIVE on this kernel.

bbswitch.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.

depmod....

DKMS: uninstall completed.
dkms: removing: nvidia-384 384.111 (4.4.0-116-generic) (x86_64)

-------- Uninstall Beginning --------
Module: nvidia-384
Version: 384.111
Kernel: 4.4.0-116-generic (x86_64)
-------------------------------------

Status: Before uninstall, this module version was ACTIVE on this kernel.

nvidia_384.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.


nvidia_384_modeset.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.

nvidia_384_modeset.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.


nvidia_384_drm.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.


nvidia_384_uvm.ko:
- Uninstallation
- Deleting from: /lib/modules/4.4.0-116-generic/updates/dkms/
- Original module
- No original module was found for this module on this kernel.
- Use the dkms install command to reinstall any previous module version.

depmod....

DKMS: uninstall completed.

So it looks to me like all it did is uninstall it not install it, is there any command I can run to see if it is there before I reboot and try it? This is a headless machine so it is a real pain to reboot into the other kernel if it does not work.

Mythological
February 25th, 2018, 09:20 AM
Well whatever I did, it worked, now running with -116 and the nVidia driver. Basically I followed the advice in #7, but after uninstalling I had to quit and restart Synaptic before it would let me reinstall.