PDA

View Full Version : [ubuntu] 8.04 server crashes (general protection fault) [x86_64]



the kaz
May 22nd, 2009, 08:04 AM
Hello all,
all of a sudden (actually, it was after a kernel update, but I have tried going back to the old kernel and was getting the same errors now, so it doesn't look kernel-specific to me), my 8.04.2 LTS server (64-bit version), which was up for about four months 24/7, is experiencing random kernel crashes. I have saved a few of the messages from just today:


[ 4086.715838] general protection fault: 0000 [1] SMP
[ 4086.716401] CPU 1
[ 4086.716909] Modules linked in: ipt_MASQUERADE xt_state ipt_LOG xt_limit ipt_REJECT iptable_nat nf_nat nf_
conntrack_ipv4 nf_conntrack ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle pppoe pppox af_packet ppp_gen
eric iptable_filter ip_tables x_tables reiserfs nvidiafb fb_ddc i2c_algo_bit vgastate parport_pc lp parport
loop ipv6 hisax crc_ccitt serio_raw usblp psmouse isdn snd_hda_intel slhc snd_pcm snd_timer snd_page_alloc s
nd_hwdep snd k8temp button i2c_piix4 i2c_core soundcore pcspkr shpchp pci_hotplug evdev ext3 jbd mbcache pat
a_acpi ata_generic pata_atiixp sg sr_mod cdrom sd_mod atiixp ide_core ahci ehci_hcd ohci_hcd libata usbcore
r8169 scsi_mod thermal processor fan fuse fbcon tileblit font bitblit softcursor
[ 4086.720273] Pid: 3855, comm: du Not tainted 2.6.24-24-server #1
[ 4086.721479] RIP: 0010:[<ffffffff802c8d85>] [<ffffffff802c8d85>] __d_lookup+0x95/0x140
[ 4086.722489] RSP: 0018:ffff81011e977bc8 EFLAGS: 00010206
[ 4086.723069] RAX: 0008000000000000 RBX: 0007ffffffffffe8 RCX: 0000000000000013
[ 4086.724159] RDX: 0008000000000000 RSI: ffff81011e977ca8 RDI: ffff81002546c8f0
[ 4086.725353] RBP: 0000000000c916f8 R08: 0000000000000001 R09: 0000000000000001
[ 4086.726743] R10: 0000000000000d8a R11: ffffffff803375e0 R12: ffff81002546c8f0
[ 4086.728081] R13: ffff81011e977be8 R14: ffff81011e977ca8 R15: 0000000000000004
[ 4086.729487] FS: 00007f02de67e6e0(0000) GS:ffff81012b801800(0000) knlGS:0000000000000000
[ 4086.730897] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4086.732248] CR2: 0000000000841038 CR3: 000000011c927000 CR4: 00000000000006e0
[ 4086.733636] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4086.735025] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4086.736431] Process du (pid: 3855, threadinfo ffff81011e976000, task ffff81011ddb4000)
[ 4086.737997] Stack: ffff8100257e00c8 00000004257dcdd0 ffff8100670e0000 ffffffff8819e00f
[ 4086.739684] 0008000000000000 ffff8100670e0004 ffff81011e977ca8 ffff81011e977e48
[ 4086.741302] ffff81011e977ca8 ffff81011e977cb8 0000000000000005 ffffffff802bd81c
[ 4086.741546] Call Trace:
:
[ 4086.743934] [<ffffffff8819e00f>] :ext3:ext3_lookup+0x10f/0x150
[ 4086.744993] [<ffffffff802bd81c>] do_lookup+0x3c/0x250
[ 4086.746123] [<ffffffff802bfe2c>] __link_path_walk+0x74c/0xe90
[ 4086.747381] [<ffffffff802c05cb>] link_path_walk+0x5b/0x100
[ 4086.748733] [<ffffffff802c088a>] do_path_lookup+0x8a/0x250
[ 4086.750053] [<ffffffff802c14eb>] __user_walk_fd+0x4b/0x80
[ 4086.751412] [<ffffffff802b8fcc>] vfs_lstat_fd+0x2c/0x70
[ 4086.752721] [<ffffffff802b91a8>] sys_newfstatat+0x68/0x70
[ 4086.754051] [<ffffffff8020c39e>] system_call+0x7e/0x83
[ 4086.755358]
[ 4086.756594]
[ 4086.756594] Code: 48 8b 02 39 6b 30 0f 18 08 75 e0 4c 39 63 28 75 da 48 8d 7b
[ 4086.759655] RIP [<ffffffff802c8d85>] __d_lookup+0x95/0x140
[ 4086.761005] RSP <ffff81011e977bc8>
[ 4086.769905] ---[ end trace 42a392e4120052e1 ]---


[ 558.428481] general protection fault: 0000 [1] SMP
[ 558.429056] CPU 1
[ 558.429578] Modules linked in: ipt_MASQUERADE xt_state ipt_LOG xt_limit ipt_REJECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle pppoe pppox af_packet ppp_generic iptable_filter ip_tables x_tables reiserfs nvidiafb fb_ddc i2c_algo_bit vgastate parport_pc lp parport loop ipv6 hisax crc_ccitt usblp isdn serio_raw snd_hda_intel slhc psmouse i2c_piix4 snd_pcm snd_timer snd_page_alloc snd_hwdep snd button k8temp i2c_core pcspkr shpchp pci_hotplug soundcore evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod pata_acpi ata_generic pata_atiixp atiixp ide_core ahci ehci_hcd ohci_hcd libata scsi_mod usbcore r8169 thermal processor fan fuse fbcon tileblit font bitblit softcursor
[ 558.432961] Pid: 7733, comm: du Not tainted 2.6.24-22-server #1
[ 558.433506] RIP: 0010:[<ffffffff802c8b05>] [<ffffffff802c8b05>] __d_lookup+0x95/0x140
[ 558.434288] RSP: 0018:ffff8101298b9bc8 EFLAGS: 00010206
[ 558.434854] RAX: 000a73e340000000 RBX: 000a73e33fffffe8 RCX: 0000000000000013
[ 558.435608] RDX: 000a73e340000000 RSI: ffff8101298b9ca8 RDI: ffff810023e8eea0
[ 558.436380] RBP: 0000000000c8c898 R08: 0000000000000001 R09: 0000000000000001
[ 558.437635] R10: 00000000000006b4 R11: ffffffff80337080 R12: ffff810023e8eea0
[ 558.438711] R13: ffff8101298b9be8 R14: ffff8101298b9ca8 R15: 0000000000000004
[ 558.440032] FS: 00007f513510f6e0(0000) GS:ffff81012b801800(0000) knlGS:0000000000000000
[ 558.441381] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 558.442551] CR2: 0000000002a249b0 CR3: 000000012801b000 CR4: 00000000000006e0
[ 558.443838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 558.445080] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 558.446333] Process du (pid: 7733, threadinfo ffff8101298b8000, task ffff8101294917d0)
[ 558.447471] Stack: ffff8100238680c8 00000004238665b0 ffff8100ca10b000 ffffffff8819b00f
[ 558.448955] 000a73e340000000 ffff8100ca10b004 ffff8101298b9ca8 ffff8101298b9e48
[ 558.450571] ffff8101298b9ca8 ffff8101298b9cb8 0000000000000009 ffffffff802bd5bc
[ 558.451131] Call Trace:
[ 558.453279] [<ffffffff8819b00f>] :ext3:ext3_lookup+0x10f/0x150
[ 558.454313] [<ffffffff802bd5bc>] do_lookup+0x3c/0x250
[ 558.455457] [<ffffffff802bfbcc>] __link_path_walk+0x74c/0xe90
[ 558.456657] [<ffffffff802c036b>] link_path_walk+0x5b/0x100
[ 558.457869] [<ffffffff802c062a>] do_path_lookup+0x8a/0x250
[ 558.459278] [<ffffffff802c128b>] __user_walk_fd+0x4b/0x80
[ 558.460604] [<ffffffff802b8d6c>] vfs_lstat_fd+0x2c/0x70
[ 558.461946] [<ffffffff802b8f48>] sys_newfstatat+0x68/0x70
[ 558.463279] [<ffffffff8020c37e>] system_call+0x7e/0x83
[ 558.464462]
[ 558.465581]
[ 558.465581] Code: 48 8b 02 39 6b 30 0f 18 08 75 e0 4c 39 63 28 75 da 48 8d 7b
[ 558.468823] RIP [<ffffffff802c8b05>] __d_lookup+0x95/0x140
[ 558.470184] RSP <ffff8101298b9bc8>
[ 558.471660] ---[ end trace 1e15b0a46c90d8df ]---


[ 642.843425] Unable to handle kernel paging request at 0000010000000000 RIP:
[ 642.843451] [<ffffffff802c8b05>] __d_lookup+0x95/0x140
[ 642.844814] PGD 0
[ 642.845485] Oops: 0000 [2] SMP
[ 642.846147] CPU 1
[ 642.846830] Modules linked in: ipt_MASQUERADE xt_state ipt_LOG xt_limit ipt_REJECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ppdev xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle pppoe pppox af_packet ppp_generic iptable_filter ip_tables x_tables reiserfs nvidiafb fb_ddc i2c_algo_bit vgastate parport_pc lp parport loop ipv6 hisax crc_ccitt usblp isdn serio_raw snd_hda_intel slhc psmouse i2c_piix4 snd_pcm snd_timer snd_page_alloc snd_hwdep snd button k8temp i2c_core pcspkr shpchp pci_hotplug soundcore evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod pata_acpi ata_generic pata_atiixp atiixp ide_core ahci ehci_hcd ohci_hcd libata scsi_mod usbcore r8169 thermal processor fan fuse fbcon tileblit font bitblit softcursor
[ 642.854001] Pid: 14216, comm: procmail Tainted: G D 2.6.24-22-server #1
[ 642.854834] RIP: 0010:[<ffffffff802c8b05>] [<ffffffff802c8b05>] __d_lookup+0x95/0x140
[ 642.855706] RSP: 0018:ffff810111e05d68 EFLAGS: 00010206
[ 642.856648] RAX: 0000010000000000 RBX: 000000ffffffffe8 RCX: 0000000000000013
[ 642.857604] RDX: 0000010000000000 RSI: ffff810111e05df8 RDI: ffff81012b001270
[ 642.859115] RBP: 00000000085a7b39 R08: 0000000000000005 R09: 0000000000000005
[ 642.860579] R10: ffffffff804989a0 R11: 0000000000000000 R12: ffff81012b001270
[ 642.862274] R13: ffff810111e05d88 R14: ffff810111e05df8 R15: 0000000000000005
[ 642.863891] FS: 00007f1622b206e0(0000) GS:ffff81012b801800(0000) knlGS:0000000000000000
[ 642.865577] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 642.867127] CR2: 0000010000000000 CR3: 000000010ef59000 CR4: 00000000000006e0
[ 642.868695] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 642.870280] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 642.871822] Process procmail (pid: 14216, threadinfo ffff810111e04000, task ffff81010c9ae7f0)
[ 642.873352] Stack: ffff810111f5afe0 0000000580239228 ffff810111e05e08 0000000000000001
[ 642.875091] 0000010000000000 000000000000104a ffff810111e05df8 ffff81012b001270
[ 642.876792] ffff81012b617300 0000000000000000 0000000000003790 ffffffff802c8bcf
[ 642.877257] Call Trace:
[ 642.879877] [<ffffffff802c8bcf>] d_lookup+0x1f/0x40
:
[ 642.881228] [<ffffffff802f8492>] proc_flush_task+0xc2/0x290
[ 642.882645] [<ffffffff802408f7>] release_task+0x27/0x360
[ 642.883882] [<ffffffff80241472>] do_wait+0x842/0xcd0
[ 642.885415] [<ffffffff802364b0>] default_wake_function+0x0/0x10
[ 642.886906] [<ffffffff802b3024>] filp_close+0x54/0x90
[ 642.888304] [<ffffffff8020c37e>] system_call+0x7e/0x83
[ 642.889629]
[ 642.890869]
[ 642.890870] Code: 48 8b 02 39 6b 30 0f 18 08 75 e0 4c 39 63 28 75 da 48 8d 7b
[ 642.894365] RIP [<ffffffff802c8b05>] __d_lookup+0x95/0x140
[ 642.895903] RSP <ffff810111e05d68>
[ 642.897379] CR2: 0000010000000000
[ 642.898729] ---[ end trace 1e15b0a46c90d8df ]---


And so on. The system ran fine until two days ago, which was a week after I upgraded the kernel. Could this be hardware-related and if so, where should I start looking?

Greetings, the kaz.

the kaz
May 22nd, 2009, 03:15 PM
Hello all,
just to add a little information: the 4GB of RAM RAM passed memtest twice without problems, as I just tested. Additionally, since my first post, I have removed one ISDN card from the system was somehow unable to dial out, just to be on the safe side.

The mainboard is a MSI K9A2VM (AMD780V chipset), the CPU is an Athlon 64 X2 4850e.

Greetings, the kaz.

the kaz
May 23rd, 2009, 06:10 PM
Hello once again,
even though no one seems to have any insight into my problem yet, I have made some further tests. The server now seems to run stable (for the time being) when I use my self-compiled 2.6.29.3 kernel (the one I was testing with earlier) with the option noapic. Unfortunately, my ISDN card won't work with "noapic" as it seems to get the wrong interrupt:


[ 10.104985] ISDN subsystem Rev: 1.1.2.3/1.1.2.3/1.1.2.2/1.1.2.3/1.1.2.2/1.1.2.2
[ 10.105308] PPP BSD Compression module registered
[ 10.105389] CAPI Subsystem Rev 1.1.2.8
[ 10.108873] capi20: Rev 1.1.2.7: started up with major 68 (middleware+capifs)
[ 10.108941] capidrv: Rev 1.1.2.2: loaded
[ 10.109026] capifs: Rev 1.1.2.3
[ 10.109089] dss1_divert module successfully installed
[ 10.109147] HiSax: Linux Driver for passive ISDN cards
[ 10.109204] HiSax: Version 3.5 (kernel)
[ 10.109261] HiSax: Layer1 Revision 2.46.2.5
[ 10.109317] HiSax: Layer2 Revision 2.30.2.4
[ 10.109374] HiSax: TeiMgr Revision 2.20.2.3
[ 10.109430] HiSax: Layer3 Revision 2.22.2.3
[ 10.109486] HiSax: LinkLayer Revision 2.59.2.4
[ 10.109553] HiSax: Total 1 card defined
[ 10.109566] HiSax: Card 1 Protocol EDSS1 Id=hfc0 (0)
[ 10.109624] HiSax: HFC-PCI driver Rev. 1.48.2.4
[ 10.109934] ACPI: PCI Interrupt Link [LNKE] BIOS reported IRQ 15, using IRQ 10
[ 10.110042] ACPI: PCI Interrupt Link [LNKE] enabled at IRQ 10
[ 10.110103] pci 0000:04:05.0: PCI INT A -> Link[LNKE] -> GSI 10 (level, low) -> IRQ 10
[ 10.110180] HiSax: HFC-PCI card manufacturer: CCD/Billion/Asuscom card name: 2BD0
[ 10.110294] HFC-PCI: defined at mem ffffc200110a8c00 fifo ffff88012bd48000(0x2bd48000) IRQ 10 HZ 100
[ 10.110371] HFC 2BDS0 PCI: IRQ 10 count 30
[ 10.110440] HFC_PCI: resetting card
[ 10.240019] HFC 2BDS0 PCI: IRQ 10 count 30
[ 10.240077] HFC 2BDS0 PCI: IRQ(10) getting no interrupts during init 1
[ 10.240137] HFC_PCI: resetting card
[ 10.260242] HFC_PCI: resetting card
[ 10.390023] HFC 2BDS0 PCI: IRQ 10 count 30
[ 10.390079] HFC 2BDS0 PCI: IRQ(10) getting no interrupts during init 2
[ 10.390141] HFC_PCI: resetting card
[ 10.410250] HFC_PCI: resetting card
[ 10.540015] HFC 2BDS0 PCI: IRQ 10 count 30
[ 10.540071] HFC 2BDS0 PCI: IRQ(10) getting no interrupts during init 3
[ 10.540135] HiSax: release hfcpci at ffffc200110a8c00
[ 10.560204] HiSax: Card HFC 2BDS0 PCI not installed !


What I don't understand is why this wasn't a problem over the last four months where the same system ran fine without the "noapic". But I would settle for a way to get the ISDN card working with "noapic" - if anybody has any idea what to try, please don't hesitate to answer.

Greetings, the kaz.