Nvidia

Fixing Random Freezes with Ubuntu 16.04 LTS, Intel Skylake and an Nvidia GPU

My Lenovo ThinkCentre m900 (10FHCTO1WW) with an Intel i7-6700 showed weird and random freezes from day 1 when trying to install Mint 18 / Ubuntu 16 with any kernel newer than 3x. After investigating for quite some hours, I gave up and installed an Ubuntu 14.04 LTS on it. The device is certified to it, but the old version did not support all features and even some basic things such as audio did not work. At lest the random freezes were gone and I could work with that machine. Now that the system will not receive updates soon, I gave it another try and setup Mint 18.2 (Sonya). Unfortunately, the Lenovo machine froze again after a few minutes, filling up the log again with the following error messages.

Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025547] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [chrome:13814]
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025549] Modules linked in: bnep ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink ...
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025614] CPU: 6 PID: 13814 Comm: chrome Not tainted 4.8.0-53-generic #56~16.04.1-Ubuntu
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025614] Hardware name: LENOVO 10FHCTO1WW/30BC, BIOS FWKT5FA   11/08/2016
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025615] task: ffff8fd736e12dc0 task.stack: ffff8fd71781c000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025615] RIP: 0010:[<ffffffff90d0b339>]  [&lt;ffffffff90d0b339&gt;] smp_call_function_many+0x1f9/0x250
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025619] RSP: 0018:ffff8fd71781fc00  EFLAGS: 00000202
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025619] RAX: 0000000000000003 RBX: 0000000000000200 RCX: 0000000000000007
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] RDX: ffff8fd85dddd920 RSI: 0000000000000200 RDI: ffff8fd85dd9a288
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] RBP: ffff8fd71781fc38 R08: 0000000000000000 R09: 00000000000000bf
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025620] R10: 0000000000000008 R11: ffff8fd85dd9a288 R12: ffff8fd85dd9a288
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025621] R13: ffff8fd85dd9a280 R14: ffffffff90c723c0 R15: 0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] FS:  00007f6120196a80(0000) GS:ffff8fd85dd80000(0000) knlGS:0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025622] CR2: 00003a66b8d61000 CR3: 0000000449abd000 CR4: 00000000003406e0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025623] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025624] Stack:
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025624]  000000000001a240 0100000000000001 ffff8fd6f7434d80 ffffffff90c723c0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025626]  0000000000000000 ffff8fd71781fd10 ffff8fd71781fc68 ffff8fd71781fc60
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025627]  ffffffff90d0b46d ffff8fd6f7434d80 ffff8fd85ddd4508 ffff8fd71781fd08
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025628] Call Trace:
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025630]  [<ffffffff90c723c0>] ? leave_mm+0xd0/0xd0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025631]  [<ffffffff90d0b46d>] on_each_cpu+0x2d/0x60
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025632]  [<ffffffff90c72c2b>] flush_tlb_kernel_range+0x4b/0x80
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025634]  [<ffffffff90de9f56>] __purge_vmap_area_lazy+0x2d6/0x320
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025635]  [<ffffffff90dea0b7>] vm_unmap_aliases+0x117/0x140
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025636]  [<ffffffff90c6e1ae>] change_page_attr_set_clr+0xee/0x4f0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025638]  [<ffffffff90c6f21f>] set_memory_ro+0x2f/0x40
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025639]  [<ffffffff90d7f11a>] bpf_prog_select_runtime+0x2a/0xd0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025641]  [<ffffffff9139a2af>] bpf_prepare_filter+0x37f/0x3f0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025642]  [<ffffffff9139a47c>] bpf_prog_create_from_user+0xbc/0x120
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025643]  [<ffffffff90d43b30>] ? proc_watchdog_cpumask+0xe0/0xe0
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025644]  [<ffffffff90d4410e>] do_seccomp+0x12e/0x610
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025645]  [<ffffffff90c991c6>] ? SyS_prctl+0x46/0x490
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025646]  [<ffffffff90d446fe>] SyS_seccomp+0xe/0x10
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025648]  [<ffffffff9149a876>] entry_SYSCALL_64_fastpath+0x1e/0xa8
Jul 12 18:49:01 FreezeCentre kernel: [ 4888.025648] Code: 94 33 00 3b 05 ed 3a e5 00 89 c1 0f 8d 99 fe ff ff 48 98 49 8b ...```


I started the investigation again and found a different trace, which pointed to the graphics card. The important hint and solution came from [SO][1]. Following a few other forum posts, it became clear that the Nvidia drivers do not play nicely with recent kernels for some specific Nvidia cards ind combination with newer kernels. So I followed the proposed steps and disabled the card complete. Just removing the card in the BIOS and uninstalling the drivers was not enough. I also had to blacklist the modules for thenouveau kernel driver.

  1. Disable the Nvidia card in the BIOS and use the Intel onchip GPU
  2. Remove all Nvidia packages:  
    `sudo apt-get remove nvidia* && sudo apt autoremove`
  3. Blacklist the module:   
    `sudo vim /etc/modprobe.d/blacklist.conf`</p> <pre class=""><code>blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off</code>```

    
    <pre class=""><code>echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf</code>```

    
    <pre class=""><code>sudo update-initramfs -u</code>```


  4. Reboot

The card is not used any more and the freezes stopped.

*-display UNGEFORDERT Beschreibung: VGA compatible controller Produkt: GK208 [GeForce GT 720] Hersteller: NVIDIA Corporation Physische ID: 0 Bus-Informationen: pci@0000:01:00.0 Version: a1 Breite: 64 bits Takt: 33MHz Fähigkeiten: pm msi pciexpress vga_controller cap_list Konfiguration: latency=0```

I hope I do not have to remove this article again and the system remains as stable as it is now for six hours.