Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 22

Thread: NVidia drivers crash on Ubuntu 20.04

  1. #11

    Re: NVidia drivers crash on Ubuntu 20.04

    Unfortunately I'm still experiencing freezes with the new GM107GL under Nouveau. No reason it works any better with the Nvidia driver, but I'll give it a try.

    Here is the requested info:
    https://paste.ubuntu.com/p/hKxhnwqHHB/

  2. #12

    Re: NVidia drivers crash on Ubuntu 20.04

    I attempted to install the Nvidia proprietary driver again, after the graphics card exchange.

    Same problems:

    • reboot after installing the Nvidia driver with software-properties-gtk
    • I do see a graphical login screen after reboot
    • black screen from the moment I enter my password and hit Enter in the login screen,
    • still able to log into the workstation with SSH
    • able to uninstall the proprietary Nvidia driver from command line
    • unable to reboot with `sudo reboot`, had to power down using the power button
    • works fine after reboot with the Nouveau driver (except for the occasional freezes as already explained)


    The Dell ePSA pre-boot diagnostics don't report any problem.

    Finally, please note these Nvidia issue (and the Nouveau issue) has been going on for 3+ years, with ubuntu 16.04, Ubuntu 18.04, and Ubuntu 20.04.
    Last edited by dimitri-papadopoulos; October 21st, 2021 at 02:03 PM.

  3. #13

    Re: NVidia drivers crash on Ubuntu 20.04

    Here is the kernel log:
    https://pastebin.ubuntu.com/p/RG76fdsppH/

    The problems start here, right after I enter the password and hit Enter in the login screen:
    Code:
    [  106.236115] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
    [  106.236981] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
    [  110.240735] rfkill: input handler enabled
    [  110.518675] BUG: kernel NULL pointer dereference, address: 0000000000000070
    [  110.518681] #PF: supervisor read access in kernel mode
    [  110.518684] #PF: error_code(0x0000) - not-present page
    [  110.518686] PGD 0 P4D 0 
    [  110.518692] Oops: 0000 [#1] SMP PTI
    [  110.518697] CPU: 7 PID: 2440 Comm: Xorg Tainted: P           O      5.4.0-89-generic #100-Ubuntu
    [  110.518700] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.18.1 07/09/2021
    [  110.518741] RIP: 0010:_nv002523kms+0x18/0x70 [nvidia_modeset]
    [  110.518745] Code: 24 1f 01 eb b2 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d 0f 73 0c 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 cf
    [  110.518748] RSP: 0018:ffffb10a41313c50 EFLAGS: 00010286
    [  110.518752] RAX: 0000000000000000 RBX: ffff9a4993d50008 RCX: 00000000000000d4
    [  110.518755] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff9a4993d50008
    [  110.518757] RBP: 0000000000010009 R08: 0000000000000004 R09: 00000000fffffffe
    [  110.518760] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9a4993d50008
    [  110.518762] R13: ffff9a4993d500a0 R14: 0000000000000fff R15: 0000000000010008
    [  110.518766] FS:  00007f501e1f3a40(0000) GS:ffff9a499dbc0000(0000) knlGS:0000000000000000
    [  110.518769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  110.518771] CR2: 0000000000000070 CR3: 0000000407e94004 CR4: 00000000003606e0
    [  110.518774] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  110.518776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  110.518778] Call Trace:
    [  110.518816]  ? _nv002522kms+0xb1/0x150 [nvidia_modeset]
    [  110.518851]  ? _nv002301kms+0x489/0x670 [nvidia_modeset]
    [  110.518859]  ? __check_object_size+0x13f/0x150
    [  110.518866]  ? _copy_from_user+0x3e/0x60
    [  110.518889]  ? _nv000451kms+0xa0/0xa0 [nvidia_modeset]
    [  110.518911]  ? _nv000663kms+0x34/0x50 [nvidia_modeset]
    [  110.518933]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
    [  110.518957]  ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
    [  110.518980]  ? nvkms_ioctl+0xc4/0x100 [nvidia_modeset]
    [  110.519298]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
    [  110.519306]  ? do_vfs_ioctl+0x407/0x670
    [  110.519310]  ? do_fcntl+0x22f/0x560
    [  110.519315]  ? putname+0x4a/0x50
    [  110.519320]  ? ksys_ioctl+0x67/0x90
    [  110.519325]  ? __x64_sys_ioctl+0x1a/0x20
    [  110.519331]  ? do_syscall_64+0x57/0x190
    [  110.519336]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [  110.519340] Modules linked in: rfcomm wireguard ip6_udp_tunnel udp_tunnel vboxnetadp(O) vboxnetflt(O) vboxdrv(O) cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi intel_rapl_msr mei_hdcp snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usb_audio snd_usbmidi_lib snd_hda_core snd_hwdep dell_smm_hwmon snd_seq_midi intel_rapl_common snd_seq_midi_event btusb x86_pkg_temp_thermal snd_rawmidi snd_seq btrtl intel_powerclamp btbcm coretemp btintel mc input_leds kvm_intel bluetooth dell_wmi kvm snd_pcm mei_me ecdh_generic ecc snd_seq_device rapl dell_smbios snd_timer mei intel_pch_thermal intel_cstate dcdbas dell_wmi_descriptor sparse_keymap snd intel_wmi_thunderbolt wmi_bmof ie31200_edac soundcore mac_hid acpi_pad nvidia_uvm(O) sch_fq_codel msr parport_pc ppdev lp parport ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd
    [  110.519390]  nvme cryptd glue_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e i2c_i801 drm ahci nvme_core libahci wmi video
    [  110.519406] CR2: 0000000000000070
    [  110.519410] ---[ end trace aa620f0211b859d1 ]---

  4. #14
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: NVidia drivers crash on Ubuntu 20.04

    I don't see "anything" glaring out at me, from the system-info script report.

    I see the system SMBIOS settings, which look all okay. I see that SecureBoot is off, like you said.

    I see that you are updated to the latest BIOS available for that machine.

    I see that you are running Gnome on XServer/X11 and that should be the best for that GPU.

    I see what the specific GPu is... and that it is not only supported by NVidia Driver 460, but is also supported by 470.x: https://www.nvidia.com/Download/driv...x/180475/en-us

    I see that it is an Ubuntu Certified Hardware platform, that is supported and safe to run HWE (the Hardware Enablement Stack), but that is not installed. That may help with your graphics...

    I don't see anything blatantly hitting me in the face...

    I see the kernel errors about nvidia_modeset errors, but not understanding really why that is occurring. What i find when I look into Null Pointer Deference errors with that address (0000000000000070), the only thing I see mentioned for that is a possible conflict with a network device, kinds of errors. Maybe post an lshw on wg0.

    You are using XServer, but I do not see a posted xserver log. That may show something (hopefully).

    Concurrent coexistance of Windows, Linux and UNIX...
    Ubuntu user # 33563, Linux user # 533637
    Sticky: [all variants] Graphics Resolution- Upgrade /Blank Screen after reboot
    UbuntuForums system-info Script

  5. #15

    Re: NVidia drivers crash on Ubuntu 20.04

    I have just added the HWE:
    Code:
    sudo apt install --install-recommends linux-generic-hwe-20.04
    I'll try again with 460 and 470 and send xservers logs - I seem to recall there's nothing interesting in them but you never know.

    As far as I know, the two SDD disks are the only "exotic" thing on this workstation.
    • PM961 NVMe SAMSUNG 512GB (I (think that was the default one on this workstation)
    • SK hynix SC300B SATA 512GB (I think that's the optional one but validated by Dell for this workstation)


    Here is the output of lshw:
    lshw: https://pastebin.ubuntu.com/p/9TJ5jpQBCD/
    Last edited by dimitri-papadopoulos; October 25th, 2021 at 06:37 PM.

  6. #16

    Re: NVidia drivers crash on Ubuntu 20.04

    Doesn't work any better with the HWE 5.11 kernel. Again, I get a black screen right after entering the password and hitting Enter in the Gnome login screen, with both Nvidia drivers 460 and 470.

    Again, here is the kernel log, as obtained with dmesg, right after the crash with driver 470:
    Code:
    [  191.636119] nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
    [  191.636769] nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
    [  195.641869] rfkill: input handler enabled
    [  195.920430] BUG: kernel NULL pointer dereference, address: 0000000000000070
    [  195.920440] #PF: supervisor read access in kernel mode
    [  195.920444] #PF: error_code(0x0000) - not-present page
    [  195.920448] PGD 0 P4D 0 
    [  195.920454] Oops: 0000 [#1] SMP PTI
    [  195.920461] CPU: 0 PID: 2622 Comm: Xorg Tainted: P           O      5.11.0-38-generic #42~20.04.1-Ubuntu
    [  195.920468] Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.18.1 07/09/2021
    [  195.920471] RIP: 0010:_nv002523kms+0x18/0x70 [nvidia_modeset]
    [  195.920532] Code: 24 1f 01 eb b2 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d ff 72 0c 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 cf
    [  195.920538] RSP: 0018:ffffad7ac095bcf0 EFLAGS: 00010286
    [  195.920544] RAX: 0000000000000000 RBX: ffff8fc9cf203008 RCX: 00000000000007ac
    [  195.920548] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff8fc9cf203008
    [  195.920552] RBP: 0000000000010009 R08: 0000000000000004 R09: ffffffffc05d8c00
    [  195.920556] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8fc9cf203008
    [  195.920560] R13: ffff8fc9cf2030a0 R14: 0000000000000fff R15: 0000000000010008
    [  195.920564] FS:  00007f2ea73d8a40(0000) GS:ffff8fccddc00000(0000) knlGS:0000000000000000
    [  195.920569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  195.920574] CR2: 0000000000000070 CR3: 000000010bf66005 CR4: 00000000003706f0
    [  195.920578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  195.920582] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  195.920585] Call Trace:
    [  195.920591]  ? _nv002522kms+0xb1/0x150 [nvidia_modeset]
    [  195.920646]  ? _nv002301kms+0x489/0x670 [nvidia_modeset]
    [  195.920700]  ? __kmalloc+0x430/0x470
    [  195.920710]  ? __check_object_size+0x13f/0x150
    [  195.920717]  ? _copy_from_user+0x3f/0x80
    [  195.920725]  ? _nv000451kms+0xa0/0xa0 [nvidia_modeset]
    [  195.920761]  ? _nv000663kms+0x34/0x50 [nvidia_modeset]
    [  195.920795]  ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
    [  195.920830]  ? nvkms_ioctl_common+0x42/0x80 [nvidia_modeset]
    [  195.920866]  ? nvkms_ioctl+0xbf/0x110 [nvidia_modeset]
    [  195.920900]  ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
    [  195.921412]  ? __x64_sys_ioctl+0x91/0xc0
    [  195.921420]  ? do_syscall_64+0x38/0x90
    [  195.921426]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [  195.921437] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha libblake2s_generic vboxnetadp(O) vboxnetflt(O) vboxdrv(O) binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg soundwire_intel intel_rapl_msr soundwire_generic_allocation soundwire_cadence mei_hdcp snd_hda_codec snd_hda_core intel_rapl_common soundwire_bus x86_pkg_temp_thermal intel_powerclamp snd_soc_core coretemp snd_compress kvm_intel ac97_bus dell_smm_hwmon snd_pcm_dmaengine kvm snd_usb_audio dell_wmi dell_smbios snd_usbmidi_lib dcdbas snd_hwdep rapl mc intel_cstate snd_seq_midi snd_seq_midi_event input_leds snd_pcm sparse_keymap intel_wmi_thunderbolt snd_rawmidi wmi_bmof dell_wmi_descriptor efi_pstore snd_seq snd_seq_device snd_timer ee1004 snd mei_me soundcore mei intel_pch_thermal ie31200_edac mac_hid acpi_pad nvidia_uvm(PO) sch_fq_codel msr parport_pc ppdev lp
    [  195.921548]  parport ip_tables x_tables autofs4 dm_crypt hid_generic usbhid hid nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd syscopyarea sysfillrect sysimgblt cryptd fb_sys_fops glue_helper cec rc_core e1000e nvme drm i2c_i801 nvme_core ahci i2c_smbus xhci_pci libahci xhci_pci_renesas wmi video
    [  195.921607] CR2: 0000000000000070
    [  195.921612] ---[ end trace 520eb3b7391bf81b ]---
    [  195.948213] RIP: 0010:_nv002523kms+0x18/0x70 [nvidia_modeset]
    [  195.948273] Code: 24 1f 01 eb b2 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 89 d5 41 b8 04 00 00 00 ba 02 01 02 00 48 83 ec 10 <8b> 46 70 8b 3d ff 72 0c 00 48 8d 4c 24 0c 89 ee 89 44 24 0c e8 cf
    [  195.948276] RSP: 0018:ffffad7ac095bcf0 EFLAGS: 00010286
    [  195.948279] RAX: 0000000000000000 RBX: ffff8fc9cf203008 RCX: 00000000000007ac
    [  195.948281] RDX: 0000000000020102 RSI: 0000000000000000 RDI: ffff8fc9cf203008
    [  195.948282] RBP: 0000000000010009 R08: 0000000000000004 R09: ffffffffc05d8c00
    [  195.948284] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8fc9cf203008
    [  195.948285] R13: ffff8fc9cf2030a0 R14: 0000000000000fff R15: 0000000000010008
    [  195.948287] FS:  00007f2ea73d8a40(0000) GS:ffff8fccddc00000(0000) knlGS:0000000000000000
    [  195.948289] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  195.948291] CR2: 0000000000000070 CR3: 000000010bf66005 CR4: 00000000003706f0
    [  195.948293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  195.948294] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  204.058940] NVRM: GPU at PCI:0000:01:00: GPU-213658c7-a8f6-8ff8-0e64-bb1c57e0b47e
    [  204.058985] NVRM: Xid (PCI:0000:01:00): 16, pid=0, Head 00000000 Count 00001465
    Here is the whole kernel log:
    dmesg: https://pastebin.ubuntu.com/p/CVfNcBFSRw/

    Here are the X server logs, Xorg.0.log which predates the login attempt and Xorg.1.log which is created right after the login attempt:
    Xorg.0.log: https://pastebin.ubuntu.com/p/MbnrMrFVGN/
    Xorg.1.log: https://pastebin.ubuntu.com/p/PFyKXgSFcM/
    Last edited by dimitri-papadopoulos; October 25th, 2021 at 06:20 PM.

  7. #17
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: NVidia drivers crash on Ubuntu 20.04

    I looked through your xorg logs and I see something curoius that might be where it is wigging out.
    Code:
    [    26.988] (==) NVIDIA(0): No modes were requested; the default mode "nvidia-auto-select"
    [    26.988] (==) NVIDIA(0):     will be used as the requested mode. 
    [    26.988] (==) NVIDIA(0):
    [    26.988] (II) NVIDIA(0): Validated MetaModes: 
    [    26.988] (II) NVIDIA(0):     "DFP-6:nvidia-auto-select"
    [    26.988] (II) NVIDIA(0): Virtual screen size determined to be 3840 x 2160 
    [    26.992] (--) NVIDIA(0): DPI set to (139, 137); computed from "UseEdidDpi" X config
    I'm thinking your display does not support those modes... LOL That follows suit with the nvidia_modeset error you set in dmesg.

    What happens ifyou create an xorg.conf file and set a default mode of 1920x1024? Or any mode that your display does support?

    Or set it via KMS by setting a helper in the Grub defaults file, so it takes it early on in the Linux Kernel boot?

    Concurrent coexistance of Windows, Linux and UNIX...
    Ubuntu user # 33563, Linux user # 533637
    Sticky: [all variants] Graphics Resolution- Upgrade /Blank Screen after reboot
    UbuntuForums system-info Script

  8. #18

    Re: NVidia drivers crash on Ubuntu 20.04

    After changing the graphics card, Dell changed the motherboard last week. Since then, I haven't experienced X server freezes with the Nouveau driver, and I am able to use the proprietary Nvidia driver.

    That doesn't necessarily mean it was hardware issue, but perhaps different BIOS settings. Changing the mother board obviously reset the BIOS settings. For comparison, here is the output of system-info after the upgrade:
    https://pastebin.ubuntu.com/p/2Yg3QFsnd4/

    The display itself is a Dell UltraSharp 32 UP3216Q monitor, it is supposed to support its native 3840 × 2160 resolution.
    Last edited by dimitri-papadopoulos; 4 Weeks Ago at 12:46 PM.

  9. #19

    Re: NVidia drivers crash on Ubuntu 20.04

    The main difference I see in the system-info output:

    • Dell replaced the previous A05 motherboard (Size: 3615MHz) with an A01 motherboard (Size: 2489MHz).


    Here is the new lshw output:
    https://pastebin.ubuntu.com/p/Dy2NTghGvZ/

    Again, the main difference I see:
    • For the new A01 motherboard size: 900MHz, compared to the previous A05 motherboard, for which size: 2232MHz.


    Here is a new dmesg output:
    https://pastebin.ubuntu.com/p/TGM3XNTpYM/

    The main difference I see:
    • I now have 4 cores (setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1) instead of 8 cores previously (setup_percpu: NR_CPUS:8192 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1). I guess hyper-threading might be disabled in the BIOS or something like that.


    Should I first look into the hyper-threading BIOS setting, or whatever causes the difference in the number of CPUs?

  10. #20

    Re: NVidia drivers crash on Ubuntu 20.04

    Actually this difference of size seems related to the number of cores:
    https://en.wikipedia.org/wiki/Front-side_bus#CPU

    I will re-enable hyper-threading and check 1) how size changes and 2) whether I can reproduce the driver issue with hyper-threading.

Page 2 of 3 FirstFirst 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •