Hello.

We recently began to switch our database (Firebird) servers from OpenSuse to Ubuntu server 10.04, and with one exception they are running like a charm.

On the one problematic server every more or less three month the server gets unresponsive, that means

  • the server responds to ping
  • local shell doesnt work (console is blocked)
  • remote login by ssh doesn't work
  • ftp isn't working, too
  • BUT, the firebird database service is still working permitting new connections.

In this case, since there's no way to login to the system, I do a hard reset.

After this I can see in the syslog, that the logging was stopped at the time the system got inresponsive. But there are no clues what could have provoked the hangup.

It seems as if some of the system daemons stop working. I am sure about the syslog daemon, the crontab daemon (some custom jobs did not run after the hangup) as well as about a custom daemon running a self-written program.
On the other hand, the firebird daemon keeps working...

The only idea I have is that the file-system got filled with temporary files, wich are deleted on a reboot, but I suppose that in syslog would apear some message if the file-system space gets near the limit ?

After a reboot there are 80 GB of space.

Please, can anyone give me a hint how to solve this issue?

Thank's in advance.

Syslog from today:
Code:
Jan  17 06:25:03 agronux rsyslogd: [origin software="rsyslogd"  swVersion="4.6.4" x-pid="498" x-info="http://www.rsyslog.com"] rsyslogd  was HUPed, type 'lightweight'.
Jan 17 06:25:04 agronux CRON[22112]: (CRON) error (grandchild #22114 failed with exit status 1)
Jan 17 06:25:04 agronux postfix/pickup[22107]: 82CA45E0985: uid=0 from=<root>
Jan 17 06:25:04 agronux postfix/cleanup[22208]: 82CA45E0985: message-id=<20120117052504.82CA45E0985@agronux>
Jan 17 06:25:04 agronux postfix/qmgr[822]: 82CA45E0985: from=<root@agronux>, size=739, nrcpt=1 (queue active)
Jan  17 06:25:04 agronux postfix/local[22210]: 82CA45E0985:  to=<root@agronux>, orig_to=<root>, relay=local, delay=0.04,  delays=0.02/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to mailbox)
Jan 17 06:25:04 agronux postfix/qmgr[822]: 82CA45E0985: removed
Jan 17 07:17:01 agronux CRON[22213]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 08:17:01 agronux CRON[22218]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 09:17:01 agronux CRON[22285]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 10:17:01 agronux CRON[22314]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 11:17:01 agronux CRON[22356]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 12:17:01 agronux CRON[22389]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 13:17:01 agronux CRON[22419]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 14:17:01 agronux CRON[22437]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 15:17:01 agronux CRON[22477]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 16:17:01 agronux CRON[22530]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 17:17:01 agronux CRON[22559]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 18:17:01 agronux CRON[22591]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 17 18:47:52 agronux kernel: Kernel logging (proc) stopped.
Jan  17 18:47:52 agronux rsyslogd: [origin software="rsyslogd"  swVersion="4.6.4" x-pid="498" x-info="http://www.rsyslog.com"] exiting  on signal 15.
Jan 18 13:41:45 agronux kernel: imklog 4.6.4, log source = /proc/kmsg started.
Jan  18 13:41:45 agronux rsyslogd: [origin software="rsyslogd"  swVersion="4.6.4" x-pid="502" x-info="http://www.rsyslog.com"] (re)start
Jan 18 13:41:45 agronux rsyslogd: rsyslogd's groupid changed to 103
Jan 18 13:41:45 agronux rsyslogd: rsyslogd's userid changed to 101
[...]