I have an issue with my Ubuntu 12.10 setup (Kubuntu actually, but not relevant) where I run into kernel panics at certain points.
I have more or less reliably been able to reproduce the problem. It seems related to excessive memory usage.
I have a Java application that analyzes loads of text (using UIMA) and stores results in memory and a PostgreSQL database. When I leave this running for a long time with a lot of memory allocated, it will always crash at some point after, ie:
So reduce other factors influencing it, I have executed this from a text terminal after shutting down lightdm
java -Xmx4096M -jar UIMARunner.jar
I have captured an image of the kernel panic at some point, the error is more or less like this:
[ 8703.160542] BUG: unable to handle kernel paging request at ffff8801afc4dc50
[ 8703.163166] IP: [<ffffffff8108f8c9>] select_no_hz_load_balancer+0x9/0x70
[ 8703.165790] PGD 1c0c063 PUD dfeda067 PMD 0
[ 8703.168417] Oops: 0000 [#1] SMP
[ 8703.171040] CPU 2
[ 8703.171053] Modules linked in:[ 8703.173642] snd_emu10k1_synth snd_emux_synth snd_seq_virmidi gpio_ich snd_Seq_midi_emu1 snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_util_mem snd
_hwdep coretemp kvm_intel snd_seq_midi kvm snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device microcode snd serio_raw lpc_ich soundcore emu10k1_gp gameport usblp fglrx(P0) 17core_edac eda
c_core mac_hid parport_pc rfcomm bnep bluetooth ppdev lp parport nfsd binfmt_misc nfs lockd fscache auth_rpcgss nfs_acl sunrpc vesafb uas usb_storage firewire_ohci hid_generic firewire_core mxm_wmi cr
c_itu_t r8169 usbhid hid pata_jmicron wmi
[ 8703.187509] Pid: 0, comm: swapper/2 Tainted: P 0 3.5.0-18-generic #29-Ubuntu Gigabyte Technology Co., Ltd. X58A-UD3R/X58A-UD3r
[ 8703.190174] RIP: 0010:[<ffffffff8108f8c9>] [ffffffff8108f8c9>] select_nohz_load_balancer+0x9/0x70
[ 8703.192835] RSP: 0018:ffff880196a25eb8 EFLAGS: 00010046
[ 8703.265376] Kernel panic - not syncing: Attempted to kill the idle task!
- Gigabyte X58A-UD3R mainboard
- 3 x GeIL GV36GB1333C9TC (6GB RAM total)
- Intel Core i7 930
- Sapphire HD5770 1GB GDDR5 PCIE (AMD GFX card)
Since I have an SSD in this machine, I don't have any swap space set up. While it might be the case that it's running out of memory (although I should be able to allocate 4GB of the 6GB to Java, right?), this should not cause a kernel panic.
Since it's seems to be a issue related to memory, I tried running memtest86 from a USB-stick (the one supplied with Ubuntu is buggy and always fails at test #7), but the memory came out just fine there.
Since the kernel panic mentions that the kernel is tainted, I tried removing the fglrx driver using jockey-kde, but this results in a similar problem.
The main problem is that I usually turn my monitor off when I start the Java-program, but when I come back and turn the monitor on after it has crashed, I get no screen at all, there's no signal coming from the video card. It's therefore hard to see the exact message after every crash.
Before, this also happened while the same process was running in a graphical environment: what happens then is that the entire screen just completely freezes, I cannot move the mouse, I cannot type, nothing happens. However, I also do not get a error message / kernel panic. Most likely because that does not work in a graphical environment?
Anyway, I would really like to debug this problem and find the cause, but I'm uncertain at how to proceed in pinpointing this problem.
Or should I file a bug report about this?