Page 2 of 2 FirstFirst 12
Results 11 to 16 of 16

Thread: server crashing once a day

  1. #11
    Join Date
    Feb 2007
    Location
    West Hills CA
    Beans
    10,044
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: server crashing once a day

    Rather than waiting for the server to crash, monitor the swap file. When it passes some predetermined value (say 500 MB) then start pruning processes and examining log files.

    Write a script that examines your swap file every 10 minutes. If it exceeds 500 MB then dump some vmstat data and send an email or wall message to take action.

    cat /proc/swaps

    man gawk

    cat /proc/swaps | gawk '/sda3/ {print $4}'

    If above expression is greater than 500 MB then do something. Your swap device will be different than mine (sda3).

    echo "My server is about to crash!!" | wall
    Last edited by tgalati4; March 17th, 2010 at 09:21 PM.
    -------------------------------------
    Oooh Shiny: PopularPages

    Unumquodque potest reparantur. Patientia sit virtus.

  2. #12
    Join Date
    Mar 2007
    Beans
    17

    Re: server crashing once a day

    First, thanks for all your tips/suggestions.

    It keeps crashing, no matter what I do. It is pretty unusable since I can't login to it (neither remotely nor local).

    Looking at the logs the only thing I see for all the sites we're hosting is a bunch of "file does not exist", but that shouldn't hurt, right?

    However, looking at the php configuration I've found that the variable memory_limit was set to 128M. I've never used such a high value before, so it must have been a suggestion from a client. I've decided to change it to a safer value (32M).

    Since this is pretty critical, I'll try to deploy a HA Cluster with 2 servers.
    But, if the error is caused by some code, it will be useless.

    I'll keep monitoring this week, checking swap and log files (I'd love to find the problem, but seems that is not going to happen...).

    Thanks!
    Last edited by flipybcn; March 18th, 2010 at 11:30 AM.

  3. #13
    Join Date
    Dec 2004
    Location
    Belgium
    Beans
    115
    Distro
    Dapper Drake Testing/

    Re: server crashing once a day

    I've exactly the same problem. I monitor swap and memory and nothing special happens. It's very very sudden as I can see on graphs.

    So not a leak but more like a fork-bomb.

    It's completely random but not once-a-day. More 3 times a week then nothing for one month.

    No idea of what it could be.

    My log :


    Mar 23 16:56:41 localhost kernel: Out of memory: kill process 16678 (apache2) score 17006 or a child
    Mar 23 16:56:41 localhost kernel: Killed process 8674 (apache2)
    Mar 23 16:56:41 localhost kernel: fail2ban-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
    Mar 23 16:56:41 localhost kernel: fail2ban-server cpuset=/ mems_allowed=0
    Mar 23 16:56:41 localhost kernel: Pid: 22547, comm: fail2ban-server Not tainted 2.6.32.2-xxxx-std-ipv4-32 #1
    Mar 23 16:56:41 localhost kernel: Call Trace:
    Mar 23 16:56:41 localhost kernel: [] oom_kill_process+0xa0/0x2b0
    Mar 23 16:56:41 localhost kernel: [] ? select_bad_process+0xab/0xe0
    Mar 23 16:56:41 localhost kernel: [] __out_of_memory+0x4e/0xb0
    Mar 23 16:56:41 localhost kernel: [] out_of_memory+0x52/0xa0
    Mar 23 16:56:41 localhost kernel: [] __alloc_pages_nodemask+0x527/0x540
    Mar 23 16:56:41 localhost kernel: [] __do_page_cache_readahead+0xd2/0x1d0
    Mar 23 16:56:41 localhost kernel: [] ra_submit+0x28/0x40
    Mar 23 16:56:41 localhost kernel: [] filemap_fault+0x3b0/0x3c0
    Mar 23 16:56:41 localhost kernel: [] __do_fault+0x4c/0x460
    Mar 23 16:56:41 localhost kernel: [] ? filemap_fault+0x0/0x3c0
    Mar 23 16:56:41 localhost kernel: [] handle_mm_fault+0x13c/0x7f0
    Mar 23 16:56:41 localhost kernel: [] ? finish_task_switch+0x3a/0xb0
    Mar 23 16:56:41 localhost kernel: [] ? ktime_get_ts+0xed/0x110
    Mar 23 16:56:41 localhost kernel: [] ? poll_select_copy_remaining+0xc7/0x110
    Mar 23 16:56:41 localhost kernel: [] do_page_fault+0x121/0x300
    Mar 23 16:56:41 localhost kernel: [] ? sys_select+0x3d/0xb0
    Mar 23 16:56:41 localhost kernel: [] ? do_page_fault+0x0/0x300
    Mar 23 16:56:41 localhost kernel: [] error_code+0x66/0x6c
    Mar 23 16:56:41 localhost kernel: [] ? do_page_fault+0x0/0x300
    Mar 23 16:56:41 localhost kernel: Mem-Info:
    Mar 23 16:56:41 localhost kernel: DMA per-cpu:
    Mar 23 16:56:41 localhost kernel: CPU 0: hi: 0, btch: 1 usd: 0
    Mar 23 16:56:41 localhost kernel: CPU 1: hi: 0, btch: 1 usd: 0
    Mar 23 16:56:41 localhost kernel: Normal per-cpu:
    Mar 23 16:56:41 localhost kernel: CPU 0: hi: 186, btch: 31 usd: 61
    Mar 23 16:56:41 localhost kernel: CPU 1: hi: 186, btch: 31 usd: 87
    Mar 23 16:56:41 localhost kernel: HighMem per-cpu:
    Mar 23 16:56:41 localhost kernel: CPU 0: hi: 42, btch: 7 usd: 27
    Mar 23 16:56:41 localhost kernel: CPU 1: hi: 42, btch: 7 usd: 13
    Mar 23 16:56:41 localhost kernel: active_anon:113730 inactive_anon:113864 isolated_anon:0
    Mar 23 16:56:41 localhost kernel: active_file:555 inactive_file:749 isolated_file:0

  4. #14
    Join Date
    Dec 2004
    Location
    Belgium
    Beans
    115
    Distro
    Dapper Drake Testing/

    Re: server crashing once a day

    It looks like I was simply targetted by some slowloris bots.

    Installing libapache2-mod-antiloris allowed me to not reboot anymore.

  5. #15
    Join Date
    Jul 2007
    Beans
    118
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: server crashing once a day

    Looks to me like you are running zoneminder, possible? I'm having the same issues so the zoneminder code may be the issue, I haven't found it yet though.

    Jeff

  6. #16
    Join Date
    Feb 2007
    Location
    West Hills CA
    Beans
    10,044
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: server crashing once a day

    Yea, zoneminder is not exactly enterprise-class code. If you are running zoneminder, then move it another machine or stop it for a month. If a camera stream gets interrupted, zoneminder (or the video modules) don't exit gracefully--kernel panics!
    -------------------------------------
    Oooh Shiny: PopularPages

    Unumquodque potest reparantur. Patientia sit virtus.

Page 2 of 2 FirstFirst 12

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •