Page 2 of 6 FirstFirst 1234 ... LastLast
Results 11 to 20 of 59

Thread: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

  1. #11
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Quote Originally Posted by spynappels View Post
    I'm sorry my (well meant) comments seem to have upset you, I was saying that it is an excellent how-to, and was simply offering a suggestion as to how it could be done in a way that most matches the recommended setup according to the article I linked to.

    A rewrite is not required, I was suggesting an alternative which did not require setting up a root password,
    Code:
    sudo su
    instead of
    Code:
    su root
    Anyway, enjoy using Linux, it is always good to have more than one way of doing things.
    My apologies. In my ignorance, I misunderstood what you were asking. However, if I simply change the syntax which will accomplish the same thing, it may present a long-term problem for those looking at this thread. Meaning, if I present this method without a warning, I can see how people would go about doing the sudo su on a regular basis rather than just for the initial setup.

    I'll contemplate this and figure out how to arrange my words for this approach.

    Thanks for the feedback and clarification.
    LHammonds

  2. #12
    Join Date
    Nov 2007
    Location
    Newry, Northern Ireland
    Beans
    1,258

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Quote Originally Posted by LHammonds View Post
    I'll contemplate this and figure out how to arrange my words for this approach.
    In the Initial Configuration section, you could put in a note saying that the following commands need root privileges, this can be achieved by prefacing the command with sudo. Rather than doing this for every command, you can temporarily give yourself root privileges by using sudo su.

    Good luck, I genuinely think this is a great how-to.
    Can't think of anything profound or witty.
    My Blog: http://gonzothegeek.blogspot.co.uk/

  3. #13
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Quote Originally Posted by spynappels View Post
    Good luck, I genuinely think this is a great how-to.
    Thanks. It is more of a "How I do" than a "How to" since I know just enough to make things work rather than knowing all the various ways of doing things or even knowing exactly what is going on under-the-hood.

    Still not sure if I am doing things right or wrong other than just getting it to work...I don't typically get many responses on such things. And after seeing my initial response to you, I probably won't get any other suggestions. hehehe.

    Thanks,
    LHammonds

  4. #14
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Configuration Framework

    The 1st thing I like to do is the creation of the folder structure I plan to use and then copy or rename all example configuration files to unused text files. This ensures the originals are preserved as a reference.
    Code:
    
    mkdir -p /etc/nagios/servers
    mkdir -p /etc/nagios/printers
    mkdir -p /etc/nagios/switches
    mkdir -p /etc/nagios/workstations
    cp /etc/nagios/nagios.cfg /etc/nagios/example-nagios.txt
    cp /etc/nagios/resource.cfg /etc/nagios/example-resource.txt
    mv /etc/nagios/objects/windows.cfg /etc/nagios/servers/example-win.txt
    mv /etc/nagios/objects/localhost.cfg /etc/nagios/servers/example-local.txt
    mv /etc/nagios/objects/switch.cfg /etc/nagios/switches/example-sw.txt
    mv /etc/nagios/objects/printer.cfg /etc/nagios/printers/example-ptr.txt
    cp /etc/nagios/objects/commands.cfg /etc/nagios/objects/example-commands.txt
    cp /etc/nagios/objects/contacts.cfg /etc/nagios/objects/example-contacts.txt
    cp /etc/nagios/objects/templates.cfg /etc/nagios/objects/example-templates.txt
    cp /etc/nagios/objects/timeperiods.cfg /etc/nagios/objects/example-timeperiods.txt
    chown --recursive nagios:nagios /etc/nagios/*
    chmod --recursive 0664 *.cfg
    
    Edit /etc/nagios/nagios.cfg and uncomment/add lines 52, 53 and 54 so it looks like this:
    Code:
    
    cfg_dir=/etc/nagios/servers
    cfg_dir=/etc/nagios/printers
    cfg_dir=/etc/nagios/switches
    cfg_dir=/etc/nagios/workstations
    
    This allows you to place config files in those folders and they will be automatically picked up without having to edit the Nagios.cfg file. I have a file for each object...or you could place all objects into a single file but it makes it harder to edit with the more you monitor.

    verify.sh

    Anytime you need to make a configuration change, you should always run a verification against your changes to ensure the Nagios service will be able to start up once you restart the service for the change to take effect. This is called the pre-flight check and this script will make it easier to run.

    The full command is this:

    Code:
    /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
    As you can see, it is a lot to type/remember. I prefer to have a handy little script in the configuration folder to make it easier to run a verification.

    /etc/nagios/verify.sh
    Code:
    
    #!/bin/bash
    /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
    
    Do not forget to make it executable after creating it. chmod 0755 /etc/nagios/verify.sh

    Now all that has to be done is to type ./verify.sh if in the config folder. If currently sitting in a sub-folder, just type ../verify.sh

    Host Groups

    I group all of my objects according to how I like to see them separated. This is done using "hostgroups" when defining a host. I keep all of these hostgroups defined in a single configuration file.

    The file is referenced in /etc/nagios/nagios.cfg with the following line:

    Code:
    cfg_file=/etc/nagios/objects/hostgroups.cfg
    Here is a sample of what is contained in that file:

    /etc/nagios/objects/hostgroups.cfg
    Code:
    ###############################################################################
    ###############################################################################
    #
    # HOST GROUP DEFINITIONS
    #
    ###############################################################################
    ###############################################################################
    
    define hostgroup{
            hostgroup_name  ibm-servers
            alias           IBM Servers
            }
    
    define hostgroup{
            hostgroup_name  aix-servers
            alias           IBM AIX Servers
            }
    
    define hostgroup{
            hostgroup_name  ubuntu-servers
            alias           Ubuntu Servers
            }
    
    define hostgroup{
            hostgroup_name  esx-servers
            alias           ESX Servers
            }
    
    define hostgroup{
        hostgroup_name    windows2000-servers
        alias        Windows 2000 Servers
        }
    
    define hostgroup{
        hostgroup_name    windows2003-servers
        alias        Windows 2003 Servers
        }
    
    define hostgroup{
        hostgroup_name    windows2008-servers
        alias        Windows 2008 Servers
        }
    
    define hostgroup{
        hostgroup_name    win7-pcs
        alias        Windows 7 PCs
        }
    
    define hostgroup{
        hostgroup_name    winxp-pcs
        alias        Windows XP PCs
        }
    
    define hostgroup{
        hostgroup_name    switches
        alias        Network Switches
        }
    
    define hostgroup{
        hostgroup_name    wireless
        alias        Wireless Access Points
        }
    
    define hostgroup{
        hostgroup_name    printers-hp
        alias        HP Printers
        }
    
    define hostgroup{
        hostgroup_name    printers-brother
        alias        Brother Printers
        }
    
    define hostgroup{
        hostgroup_name    copiers-toshiba
        alias        Toshiba Copiers
        }
    Sample Ubuntu Server Config File

    Here is my basic shell for an Ubuntu server:

    /etc/nagios/servers/srv-wiki.cfg
    Code:
    ###############################################################################
    #
    # HOST DEFINITION
    #
    ###############################################################################
    
    define host{
            use             ubuntu-server
            host_name       srv-wiki
            alias           SRV-Wiki
            address         192.168.107.23
            hostgroups      ubuntu-servers
            contacts        linux-admin-pager
            parents         srv-esxi1
            }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     PING
        check_command           check_icmp!100.0,20%!500.0,60%
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     HTTP
        check_command           check_http
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     APT Upgrade
        check_command           check_nrpe!check_apt
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     APT Upgrade MotD
        check_command           check_nrpe!check_apt_motd
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     All Disks
        check_command           check_nrpe!check_disk_all
        notifications_enabled   1
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     Current Load
        check_command           check_nrpe!check_load
        notifications_enabled   1
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     Total Processes
        check_command           check_nrpe!check_total_procs
        notifications_enabled   1
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     Swap Usage
        check_command           check_nrpe!check_swap
        notifications_enabled   1
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     Zombie Processes
        check_command           check_nrpe!check_zombie_procs
        notifications_enabled   1
        }
    
    define service{
        use                     generic-service
        host_name               srv-wiki
        service_description     Users
        check_command           check_nrpe!check_users
        }

    Sample Windows Server Config File

    Here is my basic shell for a Windows server:

    /etc/nagios/servers/srv-mssql.cfg
    Code:
    define host{
        use             windows-server
        host_name       srv-mssql
        alias           Win2008-SRV-GP
        address         192.168.107.69
        hostgroups      windows2008-servers
            contacts        windows-admin-email
        parents         srv-esxi2
        }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     NSClient++ Version
            check_command           check_nt!CLIENTVERSION -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     Uptime
            check_command           check_nt!UPTIME -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     CPU Load
            check_command           check_nt!CPULOAD!-l 5,80,90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     Memory Usage
            check_command           check_nt!MEMUSE!-w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     hd-service
            host_name               srv-mssql
            service_description     Drive C:
            check_command           check_nt!USEDDISKSPACE!-l c -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     hd-service
            host_name               srv-mssql
            service_description     Drive D:
            check_command           check_nt!USEDDISKSPACE!-l d -w 80 -c 90 -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     MS SQL Server
            check_command           check_nt!SERVICESTATE!-d SHOWALL -l MSSQLSERVER -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     SQL Server Agent
            check_command           check_nt!SERVICESTATE!-d SHOWALL -l SQLSERVERAGENT -H $HOSTADDRESS$ -p 12489 -s $USER5$
            }
    
    define service{
            use                     generic-service
            host_name               srv-mssql
            service_description     WindowsUpdates
            check_command           check_nrpe!check_updates!1
            }
    
    ## This can be used for servers that require the console to be logged in.
    #define service{
    #        use                     generic-service
    #        host_name               srv-mssql
    #        service_description     Explorer
    #        check_command           check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe -H $HOSTADDRESS$ -p 12489 -s $USER5$
    #        }
    Last edited by LHammonds; May 29th, 2012 at 06:40 PM.

  5. #15
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Sample Network Switch Config File

    Here is my basic shell for a switch:

    NOTE: The MIB codes are specific to the hardware, you probably will need to research the MIB that matches your hardware.

    Code:
    ###############################################################################
    # Switches.cfg
    #
    # Last Modified: 2012-05-25
    ###############################################################################
    
    ###############################################################################
    #
    # HOST DEFINITIONS
    #
    ###############################################################################
    
    define host{
            use             summit-switch
            host_name       SW-TX-IS
            alias           Texas IS Area
            address         192.168.107.230
            hostgroups      switches
            parents         SW-TX-Core
            }
    
    define host{
            use             cisco-switch
            host_name       SW-TX-FD
            alias           Texas Front Desk
            address         192.168.107.231
            hostgroups      switches
            parents         SW-TX-FD
            }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    
    # Ping switch
    
    define service{
            use                     switch-critical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     PING
            check_command           check_ping!200.0,20%!600.0,60%
            }
    
    # Monitor uptime via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Uptime
            check_command           check_snmp!-C public -o sysUpTime.0
            }
    
    # Monitor Contact via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Contact
            check_command           check_snmp!-C public -o sysContact.0
            }
    
    # Monitor Location via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Location
            check_command           check_snmp!-C public -o sysLocation.0
            }
    
    # Monitor Over Temperature Alarm via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Temperature Over Alarm
            check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.7.0
            }
    
    # Monitor Current Temperature via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Temperature Current
            check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.8.0
            }
    
    # Monitor the Primary Software Revision Number via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Software Rev 1st
            check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.13.0
            }
    
    # Monitor the Secondary Software Revision Number via SNMP
    
    define service{
            use                     switch-noncritical-service
            host_name               SW-TX-IS,SW-TX-FD
            service_description     Software Rev 2nd
            check_command           check_snmp!-C public -o .1.3.6.1.4.1.1916.1.1.1.14.0
            }
    Sample HP Printer Config File

    Here is my basic shell for an HP printer:

    Code:
    ###############################################################################
    # Printer-HP.cfg
    #
    # Last Modified: 2012-05-25
    ###############################################################################
    
    ###############################################################################
    #
    # HOST DEFINITIONS
    #
    ###############################################################################
    
    define host{
            use             generic-printer
            host_name       PTR-TX-ADMIN
            alias           Texas Admin
            address         192.168.107.254
            hostgroups      printers-hp
            parents         SW-TX-Core
    }
    
    define host{
            use             generic-printer
            host_name       PTR-TX-ADMIN-COLOR
            alias           Texas Admin - HPColor
            address         192.168.107.253
            hostgroups      printers-hp
            parents         SW-TX-Core
            }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    
    define service{
            use                     hp-noncritical-service
            host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
            service_description     PING
            check_command           check_ping!3000.0,80%!5000.0,100%
            }
    
    define service{
            use                     hp-noncritical-service
            host_name               PTR-TX-ADMIN,PTR-TX-ADMIN-COLOR
            service_description     Printer Status
            check_command           check_hpjd!-C public
            }
    Sample Brother Printer Config File

    Here is my basic shell for an Brother printer:

    Code:
    ###############################################################################
    # Printer-Brother.cfg
    #
    # Last Modified: 2010-05-25
    ###############################################################################
    
    ###############################################################################
    #
    # HOST DEFINITIONS
    #
    ###############################################################################
    
    define host{
            use             generic-printer
            host_name       PTR-TX-IS
            alias           Texas IS - ISHP
            address         192.168.107.252
            hostgroups      printers-brother
            parents         SW-TX-Core
            }
    
    define host{
            use             generic-printer
            host_name       PTR-TX-FD
            alias           Texas Front Desk
            address         192.168.107.251
            hostgroups      printers-brother
            parents         SW-TX-Core
            }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    
    # Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.
    
    define service{
            use                     brother-noncritical-service
            host_name               PTR-TX-IS,PTR-TX-FD
            service_description     PING
            check_command           check_ping!3000.0,80%!5000.0,100%
            normal_check_interval   10
            retry_check_interval    1
            }
    Sample Toshiba Copier Config File

    Here is my basic shell for a Toshiba Copier:

    Code:
    ###############################################################################
    # Copier-Toshiba.cfg
    #
    # Last Modified: 2012-05-25
    ###############################################################################
    
    ###############################################################################
    #
    # HOST DEFINITIONS
    #
    ###############################################################################
    
    define host{
            use             toshiba-copier
            host_name       TE-COPIER-01
            alias           Toshiba e-Studio255
            address         192.168.107.250
            hostgroups      copiers-toshiba
            parents         SW-TX-Core
            }
    
    define host{
            use             toshiba-copier
            host_name       TE-COPIER-02
            alias           Toshiba e-Studio255
            address         192.168.107.249
            hostgroups      copiers-toshiba
            parents         SW-TX-Core
            }
    
    ###############################################################################
    #
    # SERVICE DEFINITIONS
    #
    ###############################################################################
    
    # Create a service for "pinging" the printer occassionally.  Useful for monitoring RTA, packet loss, etc.
    
    define service{
            use                     copier-service
            host_name               TE-COPIER-01,TE-COPIER-02
            service_description     PING
            check_command           check_ping!3000.0,80%!5000.0,100%
            }
    
    define service{
            use                     copier-service
            host_name               TE-COPIER-01,TE-COPIER-02
            service_description     Contact
            check_command           check_snmp!-C public -o sysContact.0
            }
    
    define service{
            use                     copier-service
            host_name               TE-COPIER-01,TE-COPIER-02
            service_description     Location
            check_command           check_snmp!-C public -o sysLocation.0
            }

  6. #16
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Monitoring Remote Linux Servers

    Since there are other Linux boxes that need to be monitored, the NRPE plugin and NRPE service will be installed on each Linux box.

    Setup the remote Linux server to be monitored:

    Create the Nagios user and group:
    Code:
    
    groupadd --system --gid 9000 nagios
    adduser --system --gid 9000 --home /usr/local/nagios nagios
    chown nagios:nagios /usr/local/nagios
    chmod 0755 /usr/local/nagios
    
    Install Nagios standard and NRPE plugins. Rather and compiling from source, we will just use what comes with the repository.
    Code:
    
    aptitude -y install nagios-plugins nagios-nrpe-server
    
    Make a backup of the NRPE configuration files before modifying them:
    Code:
    
    cp /etc/nagios/nrpe.cfg /etc/nagios/nrpe.cfg.bak
    cp /etc/nagios/nrpe_local.cfg /etc/nagios/nrpe_local.cfg.bak
    
    Edit the local configuration:
    Code:
    
    vi /etc/nagios/nrpe_local.cfg
    
    Add the IP of your Nagios server to the "allowed_hosts" line and list only the plugins that be used:
    Code:
    
    allowed_hosts=192.168.107.21,127.0.0.1
    command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
    command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
    command[check_disk_app]=/usr/lib/nagios/plugins/check_disk -p /var -w 20% -c 10%
    command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -p / -w 20% -c 10%
    command[check_disk_all]=/usr/lib/nagios/plugins/check_disk -w 15% -c 10%
    command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
    command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
    command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 15% -c 10%
    command[check_apt]=/usr/lib/nagios/plugins/check_apt
    
    TIP: if you define separate disk checks like the above, you can assign different notifications. For example, you could have the Linux administrator get email notification when the root partition reaches the warning threshold (during business hours) and send an alert to his pager (at any time of the day) if the root partition reaches critical. The application manager could get a different notice for /var notices such as both warnings and criticals going to through SMS to his phone at any time of the day.

    Check the status of the NRPE server:
    Code:
    /etc/init.d/nagios-nrpe-server status
    If the NRPE server is not running, this is how you can start it:
    Code:
    /etc/init.d/nagios-nrpe-server start
    If the NRPE server was already running and you made configuration changes, use this command to load the new changes:
    Code:
    /etc/init.d/nagios-nrpe-server reload
    Now see if your configured commands will run on your server (before trying to test them remotely on the Nagios server)

    Code:
    
    /usr/lib/nagios/plugins/check_users -w 5 -c 10
    /usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
    /usr/lib/nagios/plugins/check_disk -w 15% -c 10%
    /usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
    /usr/lib/nagios/plugins/check_procs -w 150 -c 200
    /usr/lib/nagios/plugins/check_swap -w 15% -c 10%
    /usr/lib/nagios/plugins/check_apt
    
    Test Connectivity of NRPE Plugin

    Test the connectivity of the NRPE service on your server to be monitored by trying to access the server via telnet using the NRPE port number.

    If we installed the NRPE server on a machine with the address of 192.168.107.20, type the following at the console of your Nagios server:

    Code:
    telnet 192.168.107.20 5666
    If you get a response of Escape character is '^]'., then you have a good connection. Type exit to close the connection.

    If the command fails with a timeout, you might need to add rules to your firewall:
    Code:
    
    iptables -A INPUT -p tcp  --dport 5666 -j ACCEPT
    iptables -A OUTPUT -p tcp  --dport 5666 -j ACCEPT
    service iptables save
    
    Now try executing some of the commands you have configured on your remote Linux server (that stuff in the nrpe_local.cfg file)

    Code:
    
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_users
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_load
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_disk_all
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_zombie_procs
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_total_procs
    /usr/local/nagios/libexec/check_nrpe -H 192.168.107.20 -p 5666 -c check_apt
    
    If it all looks good, you can then use commands in a server configuration file. See the sample configurations posted earlier.

  7. #17
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Custom Plugin - Check HTTPS

    On one of my Linux servers, I have a web mail service that I wanted to keep an eye on. However, the check_http did not work because the server only uses SSL (HTTPS) on port 443. I did not see a check_https command so I tried my hand at making one and it works like a champ.

    Here is how I made and implemented custom HTTPS checking function.

    The first thing was to create a script that would communicate to the server. We already have WGET installed as one of the prerequisite programs so I used that program. Here is what the script looks like:

    /usr/local/nagios/libexec/check_https
    Code:
    #!/bin/bash
    ###########################################
    ## Name         : check_https
    ## Version      : 1.0
    ## Date         : 2012-01-03
    ## Author       : LHammonds
    ## Purpose      : Check for response from HTTPS server
    ## Requirements : WGET
    ## Parameters   :
    ##    1 = Server IP Address (Required)
    ##    2 = Port Number (Optional)
    ## Exit Codes   :
    ##    0 = Success
    ##    1 = Failure
    ##    2 = Error, missing required parameter
    ###########################################
    OUTFILE="/tmp/check_https_out.$$"
    ERRFILE="/tmp/check_https_err.$$"
    WGETCMD="$(which wget)"
    
    ## Do basic check on arguments passed to the script.
    if [ "$1" = "" ]; then
      echo "Missing required parameter"
      exit 2
    fi
    if [ "$2" = "" ]; then
      ## Assume default port.
      SSLPORT="443"
    else
      SSLPORT=$2
    fi
    ${WGETCMD} --no-check-certificate --output-document=${OUTFILE} -S https://$1:${SSLPORT} 2> ${ERRFILE}
    RETURNVALUE=$?
    if [ ${RETURNVALUE} -eq 0 ];  then
      echo "HTTPS OK"
      EXITCODE=0
    else
      echo "Connection refused. Code=${RETURNVALUE}"
      EXITCODE=1
    fi
    if [ -f ${OUTFILE} ]; then
      rm ${OUTFILE}
    fi
    if [ -f ${ERRFILE} ]; then
      rm ${ERRFILE}
    fi
    exit ${EXITCODE}
    After creating the file, you need to set the correct ownership and permissions as follows:
    Code:
    
    chown nagios:nagios /usr/local/nagios/libexec/check_https
    chmod 0755 /usr/local/nagios/libexec/check_https
    
    To test it out, run the command against a server running HTTPS and then against a server not running HTTPS. Example:

    Code:
    
    /usr/local/nagios/libexec/check_https 192.168.107.25 443
    
    Next, we add this script to the commands file. Type vi /etc/local/nagios/etc/objects/commands.cfg

    Find the existing "check_http" command and you basically just copy the definition and add "s" to the end of http and remove the "-I" option. Example:
    Code:
    define command{
            command_name     check_http
            command_line     $USER1$/check_http -I $HOSTADDRESS $ARG1$
            }
    
    define command{
            command_name     check_https
            command_line     $USER1$/check_https $HOSTADDRESS 443
            }
    Now we can add a service to monitor HTTPS by adding the following to the server configuration file:

    Code:
    
    define service{
            use                     generic-service
            host_name               srv-securewebserver
            service_description     web mail server
            check_command           check_https
            }
    

    Custom Plugin - Check APT MotD

    Reference: Original source

    This plugin is a bit different from the built-in APT check for Linux servers. This plugin was designed to give the same kind of messages that you get when you login to an Ubuntu console.

    One thing this script will catch that the built-in APT will not is the "reboot required" state of the server.

    The script will be executed on the remote Linux server so we will be making use of NRPE.

    On the remote Linux server, create the script:

    Code:
    
    touch /usr/lib/nagios/plugins/check_apt_motd.sh
    chown root:root /usr/lib/nagios/plugins/check_apt_motd.sh
    chmod 0755 /usr/lib/nagios/plugins/check_apt_motd.sh
    vi /usr/lib/nagios/plugins/check_apt_motd.sh
    
    /usr/lib/nagios/plugins/check_apt_motd.sh
    Code:
    #!/bin/sh
    #
    # check_apt_packages - nagios plugin
    #
    # Checks for any packages to be applied
    # Built for Ubuntu 10 (LTS), see following URL for further info
    # - http://www.sandfordit.com/vwiki/index.php/Nagios#Ubuntu_Software_Updates_Monitor
    #
    # By Simon Strutt
    # Version 1 - Jan 2012
    
    # Include standard Nagios library
    . /usr/lib/nagios/plugins/utils.sh || exit 3
    
    
    if [ ! -f /usr/lib/update-notifier/apt-check ]; then
            exit $STATE_UNKNOWN
    fi
    
    APTRES=$(/usr/lib/update-notifier/apt-check 2>&1)
    PKGS=$(echo $APTRES | cut -f1 -d';')
    SEC=$(echo $APTRES | cut -f2 -d';')
    
    if [ -f /var/run/reboot-required ]; then
            REBOOT=1
            TOAPPLY=`cat /var/run/reboot-required.pkgs`
    else
            REBOOT=0
    fi
    
    if [ "${PKGS}" -eq 0 ]; then
            if [ "${REBOOT}" -eq 1 ]; then
                    RET=$STATE_WARNING
                    RESULT="Reboot required to apply ${TOAPPLY}"
            else
                    RET=$STATE_OK
                    RESULT="No packages to be updated"
            fi
    elif [ "${SEC}" -eq 0 ]; then
            RET=$STATE_WARNING
            RESULT="${PKGS} packages to update (no security updates)"
    else
            RET=$STATE_CRITICAL
            RESULT="${PKGS} packages (including ${SEC} security) packages to update"
    fi
    
    echo $RESULT
    exit $RET
    Test the script to see if it is working: /usr/lib/nagios/plugins/check_apt_motd.sh

    The output should look something like one of these:
    Code:
    Reboot required to apply libssl0.9.8
    or
    Code:
    1 packages to update (no security updates)
    or
    Code:
    No packages to be updated
    Add the script to the trusted NRPE commands to be executed. Edit /etc/nagios/nrpe_local.cfg

    Code:
    
    command[check_apt_motd]=/usr/lib/nagios/plugins/check_apt_motd.sh
    
    The NRPE Server now needs to reload the configuration for the changes to take affect.
    Code:
    /etc/init.d/nagios-nrpe-server reload
    On the Nagios server, add the following command to the remote Linux server's configuration file:

    /etc/nagios/servers/srv-wiki.cfg
    Code:
    
    define service{
            use                             generic-service
            host_name                       srv-wiki
            service_description             APT Upgrade MotD
            check_command                   check_apt_motd
            }
    
    The final step is to verify that nothing is broken in the configuration:
    Code:
    
    /etc/nagios/verify.sh
    
    If there were no errors or warnings, restart Nagios to load the new configuration:
    Code:
    
    /etc/init.d/nagios stop
    /etc/init.d/nagios start
    
    Last edited by LHammonds; May 30th, 2012 at 06:56 PM.

  8. #18
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Custom Plugin - Check ESXi Hardware

    Reference: Original source

    I use this custom script to check the health of my ESXi servers. It is run directly from the Nagios server.

    This script requires the PyWBEM Python library. Here is how to install it:

    Code:
    aptitude -y install python-pywbem
    You then need to add a command to call the script. Edit /etc/nagios/objects/commands.cfg and add the following:

    Code:
    
    # 'check_esxi_hardware' command definition
    
    define command{
            command_name    check_esxi_hardware
            command_line    $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$ $ARG4$
            }
    
    To access the ESXi data, you will need to supply and ID/password. The password can be placed in the "resources.cfg" file but let's make sure it is secured first.

    Code:
    
    chmod 0600 /etc/nagios/resources.cfg
    chown nagios:nagios /etc/nagios/resources.cfg
    
    Edit /etc/nagios/resources.cfg and add the following:

    Code:
    
    # Password to access ESXi servers.
    $USER6$=your-esxi-password-here
    
    To add this command to an ESXi configuration file, add the following to its config file:

    /etc/nagios/servers/srv-esxi1.cfg
    Code:
    
    define service{
            use                             generic-service
            host_name                       srv-esxi1
            service_description             Server Health
            check_command                   check_esxi_hardware!your-esxi-userid-here!$USER6$!ibm
            }
    
    Now it is time to create the script:

    Code:
    
    touch /usr/local/nagios/libexec/check_esxi_hardware.py
    chown nagios:nagios /usr/local/nagios/libexec/check_esxi_hardware.py
    chmod 0755 /usr/local/nagios/libexec/check_esxi_hardware.py
    vi /usr/local/nagios/libexec/check_esxi_hardware.py
    
    /usr/local/nagios/libexec/check_esxi_hardware.py
    Code:
    #!/usr/bin/python
    # -*- coding: UTF-8 -*-
    #
    # Script for checking global health of host running VMware ESX/ESXi
    #
    # Licence : GNU General Public Licence (GPL) http://www.gnu.org/
    # This program is free software; you can redistribute it and/or
    # modify it under the terms of the GNU General Public License
    # as published by the Free Software Foundation; either version 2
    # of the License, or (at your option) any later version.
    #
    # This program is distributed in the hope that it will be useful,
    # but WITHOUT ANY WARRANTY; without even the implied warranty of
    # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    # GNU General Public License for more details.
    #
    # You should have received a copy of the GNU General Public License
    # along with this program; if not, write to the Free Software
    # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
    # 02110-1301, USA.
    #
    # Pre-req : pywbem
    #
    # Copyright (c) 2008 David Ligeret
    # Copyright (c) 2009 Joshua Daniel Franklin
    # Copyright (c) 2010 Branden Schneider
    # Copyright (c) 2010-2012 Claudio Kuenzler
    # Copyright (c) 2010 Samir Ibradzic
    # Copyright (c) 2010 Aaron Rogers
    # Copyright (c) 2011 Ludovic Hutin
    # Copyright (c) 2011 Carsten Schoene
    # Copyright (c) 2011-2012 Phil Randal
    # Copyright (c) 2011 Fredrik Aslund
    # Copyright (c) 2011 Bertrand Jomin
    # Copyright (c) 2011 Ian Chard
    # Copyright (c) 2012 Craig Hart
    #
    # The VMware 4.1 CIM API is documented here:
    #
    #   http://www.vmware.com/support/developer/cim-sdk/4.1/smash/cim_smash_410_prog.pdf
    #
    #   http://www.vmware.com/support/developer/cim-sdk/smash/u2/ga/apirefdoc/
    #
    # This Nagios plugin is maintained here:
    # http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php
    #
    #@---------------------------------------------------
    #@ History
    #@---------------------------------------------------
    #@ Date   : 20080820
    #@ Author : David Ligeret
    #@ Reason : Initial release
    #@---------------------------------------------------
    #@ Date   : 20080821
    #@ Author : David Ligeret
    #@ Reason : Add verbose mode
    #@---------------------------------------------------
    #@ Date   : 20090219
    #@ Author : Joshua Daniel Franklin
    #@ Reason : Add try/except to catch AuthError and CIMError
    #@---------------------------------------------------
    #@ Date   : 20100202
    #@ Author : Branden Schneider
    #@ Reason : Added HP Support (HealthState)
    #@---------------------------------------------------
    #@ Date   : 20100512
    #@ Author : Claudio Kuenzler www.claudiokuenzler.com
    #@ Reason : Combined different versions (Joshua and Branden)
    #@ Reason : Added hardware type switch (dell or hp)
    #@---------------------------------------------------
    #@ Date   : 20100626/28
    #@ Author : Samir Ibradzic www.brastel.com
    #@ Reason : Added basic server info
    #@ Reason : Wanted to have server name, serial number & bios version at output
    #@ Reason : Set default return status to Unknown
    #@---------------------------------------------------
    #@ Date   : 20100702
    #@ Author : Aaron Rogers www.cloudmark.com
    #@ Reason : GlobalStatus was incorrectly getting (re)set to OK with every CIM element check
    #@---------------------------------------------------
    #@ Date   : 20100705
    #@ Author : Claudio Kuenzler www.claudiokuenzler.com
    #@ Reason : Due to change 20100702 all Dell servers would return UNKNOWN instead of OK...
    #@ Reason : ... so added Aaron's logic at the end of the Dell checks as well
    #@---------------------------------------------------
    #@ Date   : 20101028
    #@ Author : Claudio Kuenzler www.claudiokuenzler.com
    #@ Reason : Changed text in Usage and Example so people dont forget to use https://
    #@---------------------------------------------------
    #@ Date   : 20110110
    #@ Author : Ludovic Hutin (Idea and Coding) / Claudio Kuenzler (Bugfix)
    #@ Reason : If Dell Blade Servers are used, Serial Number of Chassis was returned
    #@---------------------------------------------------
    #@ Date   : 20110207
    #@ Author : Carsten Schoene carsten.schoene.cc
    #@ Reason : Bugfix for Intel systems (in this case Intel SE7520) - use 'intel' as system type
    #@---------------------------------------------------
    #@ Date   : 20110215
    #@ Author : Ludovic Hutin
    #@ Reason : Plugin now catches Socket Error (Timeout Error) and added a timeout parameter
    #@---------------------------------------------------
    #@ Date   : 20110217/18
    #@ Author : Ludovic Hutin / Tom Murphy
    #@ Reason : Bugfix in Socket Error if clause
    #@---------------------------------------------------
    #@ Date   : 20110221
    #@ Author : Claudio Kuenzler www.claudiokuenzler.com
    #@ Reason : Remove recently added Timeout due to incompabatility on Windows
    #@ Reason : and changed name of plugin to check_esxi_hardware
    #@---------------------------------------------------
    #@ Date   : 20110426
    #@ Author : Claudio Kuenzler www.claudiokuenzler.com
    #@ Reason : Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson.
    #@---------------------------------------------------
    #@ Date   : 20110426
    #@ Author : Phil Randal
    #@ Reason : URLise Dell model and tag numbers (as in check_openmanage)
    #@ Reason : Return performance data (as in check_openmanage, using similar names where possible)
    #@ Reason : Minor code tidyup - use elementName instead of instance['ElementName']
    #@---------------------------------------------------
    #@ Date   : 20110428
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : If hardware type is specified as 'auto' try to autodetect vendor
    #@ Reason : Return performance data for some HP models
    #@ Reason : Indent 'verbose' output to make it easier to read
    #@ Reason : Use OptionParser to give better parameter parsing (retaining compatability with original)
    #@---------------------------------------------------
    #@ Date   : 20110503
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : Fix bug in HP Virtual Fan percentage output
    #@ Reason : Slight code reorganisation
    #@ Reason : Sort performance data
    #@ Reason : Fix formatting of current output
    #@---------------------------------------------------
    #@ Date   : 20110504
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : Minor code changes and documentation improvements
    #@ Reason : Remove redundant mismatched ' character in performance data output
    #@ Reason : Output non-integral values for all sensors to fix problem seen with system board voltage sensors
    #@          on an IBM server (thanks to Attilio Drei for the sample output)
    #@---------------------------------------------------
    #@ Date   : 20110505
    #@ Author : Fredrik Aslund
    #@ Reason : Added possibility to use first line of a file as password (file:)
    #@---------------------------------------------------
    #@ Date   : 20110505
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : Simplfy 'verboseoutput' to use 'verbose' as global variable instead of as parameter
    #@ Reason : Don't look at performance data from CIM_NumericSensor if we're not using it
    #@ Reason : Add --no-power, --no-volts, --no-current, --no-temp, and --no-fan options
    #@---------------------------------------------------
    #@ Date   : 20110506
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : Reinstate timeouts with --timeout parameter (but not on Windows)
    #@ Reason : Allow file:passwordfile in old-style arguments too
    #@---------------------------------------------------
    #@ Date   : 20110507
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : On error, include numeric sensor value in output
    #@---------------------------------------------------
    #@ Date   : 20110520
    #@ Author : Bertrand Jomin
    #@ Reason : Plugin had problems to handle some S/N from IBM Blade Servers
    #@---------------------------------------------------
    #@ Date   : 20110614
    #@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
    #@ Reason : Rewrote file handling and file can now be used for user AND password
    #@---------------------------------------------------
    #@ Date   : 20111003
    #@ Author : Ian Chard (ian@chard.org)
    #@ Reason : Allow a list of unwanted elements to be specified, which is useful
    #@          in cases where hardware isn't well supported by ESXi
    #@---------------------------------------------------
    #@ Date   : 20120402
    #@ Author : Claudio Kuenzler (www.claudiokuenzler.com)
    #@ Reason : Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
    #@---------------------------------------------------
    #@ Date   : 20120405
    #@ Author : Phil Randal (phil.randal@gmail.com)
    #@ Reason : Fix lookup of warranty info for Dell
    #@---------------------------------------------------
    #@ Date   : 20120501
    #@ Author : Craig Hart
    #@ Reason : Bugfix in manufacturer discovery when cim entry not found or empty
    #@---------------------------------------------------
    
    
    import sys
    import time
    import pywbem
    import re
    import string
    from optparse import OptionParser,OptionGroup
    
    version = '20120501'
    
    NS = 'root/cimv2'
    
    # define classes to check 'OperationStatus' instance
    ClassesToCheck = [
      'OMC_SMASHFirmwareIdentity',
      'CIM_Chassis',
      'CIM_Card',
      'CIM_ComputerSystem',
      'CIM_NumericSensor',
      'CIM_Memory',
      'CIM_Processor',
      'CIM_RecordLog',
      'OMC_DiscreteSensor',
      'OMC_Fan',
      'OMC_PowerSupply',
      'VMware_StorageExtent',
      'VMware_Controller',
      'VMware_StorageVolume',
      'VMware_Battery',
      'VMware_SASSATAPort'
    ]
    
    sensor_Type = {
      0:'unknown',
      1:'Other',
      2:'Temperature',
      3:'Voltage',
      4:'Current',
      5:'Tachometer',
      6:'Counter',
      7:'Switch',
      8:'Lock',
      9:'Humidity',
      10:'Smoke Detection',
      11:'Presence',
      12:'Air Flow',
      13:'Power Consumption',
      14:'Power Production',
      15:'Pressure',
      16:'Intrusion',
      32768:'DMTF Reserved',
      65535:'Vendor Reserved'
    }
    
    data = []
    
    perf_Prefix = {
      1:'Pow',
      2:'Vol',
      3:'Cur',
      4:'Tem',
      5:'Fan',
      6:'FanP'
    }
    
    
    # parameters
    
    # host name
    hostname=''
    
    # user
    user=''
    
    # password
    password=''
    
    # vendor - possible values are 'unknown', 'auto', 'dell', 'hp', 'ibm', 'intel'
    vendor='unknown'
    
    # verbose
    verbose=False
    
    # Produce performance data output for nagios
    perfdata=False
    
    # timeout
    timeout = 0
    
    # elements to ignore (full SEL, broken BIOS, etc)
    ignore_list=[]
    
    # urlise model and tag numbers (currently only Dell supported, but the code does the right thing for other vendors)
    urlise_country=''
    
    # collect perfdata for each category
    get_power   = True
    get_volts   = True
    get_current = True
    get_temp    = True
    get_fan     = True
    
    # define exit codes
    ExitOK = 0
    ExitWarning = 1
    ExitCritical = 2
    ExitUnknown = 3
    
    def urlised_server_info(vendor, country, server_info):
      #server_inf = server_info
      if vendor == 'dell' :
        # Dell support URLs (idea and tables borrowed from check_openmanage)
        du = 'http://support.dell.com/support/edocs/systems/pe'
        if (server_info is not None) :
          p=re.match('(.*)PowerEdge (.*) (.*)',server_info)
          if (p is not None) :
            md=p.group(2)
            if (re.match('M',md)) :
              md = 'm'
            server_info = p.group(1) + '<a href="' + du + md + '/">PowerEdge ' + p.group(2)+'</a> ' + p.group(3)
      elif vendor == 'hp':
        return server_info
      elif vendor == 'ibm':
        return server_info
      elif vendor == 'intel':
        return server_info
    
      return server_info
    
    # ----------------------------------------------------------------------
    
    def system_tag_url(vendor,country):
      url = {'xx':''}
      if vendor == 'dell':
        # Dell support sites
        supportsite = 'http://www.dell.com/support/troubleshooting/'
        dellsuffix = 'nodhs1/Index?t=warranty&servicetag='
    
        # warranty URLs for different country codes
        # EMEA
        url['at'] = supportsite + 'at/de/' + dellsuffix  # Austria
        url['be'] = supportsite + 'be/nl/' + dellsuffix  # Belgium
        url['cz'] = supportsite + 'cz/cs/' + dellsuffix  # Czech Republic
        url['de'] = supportsite + 'de/de/' + dellsuffix  # Germany
        url['dk'] = supportsite + 'dk/da/' + dellsuffix  # Denmark
        url['es'] = supportsite + 'es/es/' + dellsuffix  # Spain
        url['fi'] = supportsite + 'fi/fi/' + dellsuffix  # Finland
        url['fr'] = supportsite + 'fr/fr/' + dellsuffix  # France
        url['gr'] = supportsite + 'gr/en/' + dellsuffix  # Greece
        url['it'] = supportsite + 'it/it/' + dellsuffix  # Italy
        url['il'] = supportsite + 'il/en/' + dellsuffix  # Israel
        url['me'] = supportsite + 'me/en/' + dellsuffix  # Middle East
        url['no'] = supportsite + 'no/no/' + dellsuffix  # Norway
        url['nl'] = supportsite + 'nl/nl/' + dellsuffix  # The Netherlands
        url['pl'] = supportsite + 'pl/pl/' + dellsuffix  # Poland
        url['pt'] = supportsite + 'pt/en/' + dellsuffix  # Portugal
        url['ru'] = supportsite + 'ru/ru/' + dellsuffix  # Russia
        url['se'] = supportsite + 'se/sv/' + dellsuffix  # Sweden
        url['uk'] = supportsite + 'uk/en/' + dellsuffix  # United Kingdom
        url['za'] = supportsite + 'za/en/' + dellsuffix  # South Africa
        # America
        url['br'] = supportsite + 'br/pt/' + dellsuffix  # Brazil
        url['ca'] = supportsite + 'ca/en/' + dellsuffix  # Canada
        url['mx'] = supportsite + 'mx/es/' + dellsuffix  # Mexico
        url['us'] = supportsite + 'us/en/' + dellsuffix  # USA
        # Asia/Pacific
        url['au'] = supportsite + 'au/en/' + dellsuffix  # Australia
        url['cn'] = supportsite + 'cn/zh/' + dellsuffix  # China
        url['in'] = supportsite + 'in/en/' + dellsuffix  # India
        # default fallback
        url['xx'] = supportsite + 'us/en/' + dellsuffix  # default
      # elif vendor == 'hp':
      # elif vendor == 'ibm':
      # elif vendor == 'intel':
    
      return url.get(country,url['xx'])
    
    # ----------------------------------------------------------------------
    
    def urlised_serialnumber(vendor,country,SerialNumber):
      if SerialNumber is not None :
        tu = system_tag_url(vendor,country)
        if tu != '' :
          SerialNumber = '<a href="' + tu + SerialNumber + '">' + SerialNumber + '</a>'
      return SerialNumber
    
    # ----------------------------------------------------------------------
    
    def verboseoutput(message) :
      if verbose:
        print "%s %s" % (time.strftime("%Y%m%d %H:%M:%S"), message)
    
    # ----------------------------------------------------------------------
    
    def getopts() :
      global hosturl,user,password,vendor,verbose,perfdata,urlise_country,timeout,ignore_list,get_power,get_volts,get_current,get_temp,get_fan
      usage = "usage: %prog  https://hostname user password system [verbose]\n" \
        "example: %prog https://my-shiny-new-vmware-server root fakepassword dell\n\n" \
        "or, using new style options:\n\n" \
        "usage: %prog -H hostname -U username -P password [-V system -v -p -I XX]\n" \
        "example: %prog -H my-shiny-new-vmware-server -U root -P fakepassword -V auto -I uk\n\n" \
        "or, verbosely:\n\n" \
        "usage: %prog --host=hostname --user=username --pass=password [--vendor=system --verbose --perfdata --html=XX]\n"
    
      parser = OptionParser(usage=usage, version="%prog "+version)
      group1 = OptionGroup(parser, 'Mandatory parameters')
      group2 = OptionGroup(parser, 'Optional parameters')
    
      group1.add_option("-H", "--host", dest="host", help="report on HOST", metavar="HOST")
      group1.add_option("-U", "--user", dest="user", help="user to connect as", metavar="USER")
      group1.add_option("-P", "--pass", dest="password", \
          help="password, if password matches file:<path>, first line of given file will be used as password", metavar="PASS")
    
      group2.add_option("-V", "--vendor", dest="vendor", help="Vendor code: auto, dell, hp, ibm, intel, or unknown (default)", \
          metavar="VENDOR", type='choice', choices=['auto','dell','hp','ibm','intel','unknown'],default="unknown")
      group2.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, \
          help="print status messages to stdout (default is to be quiet)")
      group2.add_option("-p", "--perfdata", action="store_true", dest="perfdata", default=False, \
          help="collect performance data for pnp4nagios (default is not to)")
      group2.add_option("-I", "--html", dest="urlise_country", default="", \
          help="generate html links for country XX (default is not to)", metavar="XX")
      group2.add_option("-t", "--timeout", action="store", type="int", dest="timeout", default=0, \
          help="timeout in seconds - no effect on Windows (default = no timeout)")
      group2.add_option("-i", "--ignore", action="store", type="string", dest="ignore", default="", \
          help="comma-separated list of elements to ignore")
      group2.add_option("--no-power", action="store_false", dest="get_power", default=True, \
          help="don't collect power performance data")
      group2.add_option("--no-volts", action="store_false", dest="get_volts", default=True, \
          help="don't collect voltage performance data")
      group2.add_option("--no-current", action="store_false", dest="get_current", default=True, \
          help="don't collect current performance data")
      group2.add_option("--no-temp", action="store_false", dest="get_temp", default=True, \
          help="don't collect temperature performance data")
      group2.add_option("--no-fan", action="store_false", dest="get_fan", default=True, \
          help="don't collect fan performance data")
    
      parser.add_option_group(group1)
      parser.add_option_group(group2)
    
      # check input arguments
      if len(sys.argv) < 2:
        print "no parameters specified\n"
        parser.print_help()
        sys.exit(-1)
      # if first argument starts with 'https://' we have old-style parameters, so handle in old way
      if re.match("https://",sys.argv[1]):
        # check input arguments
        if len(sys.argv) < 5:
          print "too few parameters\n"
          parser.print_help()
          sys.exit(-1)
        if len(sys.argv) > 5 :
          if sys.argv[5] == "verbose" :
            verbose = True
        hosturl = sys.argv[1]
        user = sys.argv[2]
        password = sys.argv[3]
        vendor = sys.argv[4]
      else:
        # we're dealing with new-style parameters, so go get them!
        (options, args) = parser.parse_args()
    
        # Making sure all mandatory options appeared.
        mandatories = ['host', 'user', 'password']
        for m in mandatories:
          if not options.__dict__[m]:
            print "mandatory parameter '--" + m + "' is missing\n"
            parser.print_help()
            sys.exit(-1)
    
        hostname=options.host.lower()
        # if user has put "https://" in front of hostname out of habit, do the right thing
        # hosturl will end up as https://hostname
        if re.match('^https://',hostname):
          hosturl = hostname
        else:
          hosturl = 'https://' + hostname
    
        user=options.user
        password=options.password
        vendor=options.vendor.lower()
        verbose=options.verbose
        perfdata=options.perfdata
        urlise_country=options.urlise_country.lower()
        timeout=options.timeout
        ignore_list=options.ignore.split(',')
        get_power=options.get_power
        get_volts=options.get_volts
        get_current=options.get_current
        get_temp=options.get_temp
        get_fan=options.get_fan
    
      # if user or password starts with 'file:', use the first string in file as user, second as password
      if (re.match('^file:', user) or re.match('^file:', password)):
            if re.match('^file:', user):
              filextract = re.sub('^file:', '', user)
              filename = open(filextract, 'r')
              filetext = filename.readline().split()
              user = filetext[0]
              password = filetext[1]
              filename.close()
            elif re.match('^file:', password):
              filextract = re.sub('^file:', '', password)
              filename = open(filextract, 'r')
              filetext = filename.readline().split()
              password = filetext[0]
              filename.close()
    
    # ----------------------------------------------------------------------
    
    getopts()
    
    # if running on Windows, don't use timeouts and signal.alarm
    on_windows = True
    os_platform = sys.platform
    if os_platform != "win32":
      on_windows = False
      import signal
      def handler(signum, frame):
        print 'CRITICAL: Execution time too long!'
        sys.exit(ExitCritical)
    
    # connection to host
    verboseoutput("Connection to "+hosturl)
    wbemclient = pywbem.WBEMConnection(hosturl, (user,password), NS)
    
    # Add a timeout for the script. When using with Nagios, the Nagios timeout cannot be < than plugin timeout.
    if on_windows == False and timeout > 0:
      signal.signal(signal.SIGALRM, handler)
      signal.alarm(timeout)
    
    # run the check for each defined class
    GlobalStatus = ExitUnknown
    server_info = ""
    bios_info = ""
    SerialNumber = ""
    ExitMsg = ""
    
    # if vendor is specified as 'auto', try to get vendor from CIM
    # note: the default vendor is 'unknown'
    if vendor=='auto':
      c=wbemclient.EnumerateInstances('CIM_Chassis')
      man=c[0][u'Manufacturer']
      if re.match("Dell",man):
        vendor="dell"
      elif re.match("HP",man):
        vendor="hp"
      elif re.match("IBM",man):
        vendor="ibm"
      elif re.match("Intel",man):
        vendor="intel"
      else:
        vendor='unknown'
    
    for classe in ClassesToCheck :
      verboseoutput("Check classe "+classe)
      try:
        instance_list = wbemclient.EnumerateInstances(classe)
      except pywbem.cim_operations.CIMError,args:
        if ( args[1].find('Socket error') >= 0 ):
          print "CRITICAL: %s" %args
          sys.exit (ExitCritical)
        else:
          verboseoutput("Unknown CIM Error: %s" % args)
      except pywbem.cim_http.AuthError,arg:
        verboseoutput("Global exit set to CRITICAL")
        GlobalStatus = ExitCritical
        ExitMsg = " : Authentication Error! "
      else:
        # GlobalStatus = ExitOK #ARR
        for instance in instance_list :
          sensor_value = ""
          elementName = instance['ElementName']
          elementNameValue = elementName
          verboseoutput("  Element Name = "+elementName)
    
          # Ignore element if we don't want it
          if elementName in ignore_list :
            verboseoutput("    (ignored)")
            continue
    
          # BIOS & Server info
          if elementName == 'System BIOS' :
            bios_info =     instance[u'Name'] + ': ' \
                + instance[u'VersionString'] + ' ' \
                + str(instance[u'ReleaseDate'].datetime.date())
            verboseoutput("    VersionString = "+instance[u'VersionString'])
    
          elif elementName == 'Chassis' :
            man = instance[u'Manufacturer']
        if man is None :
          man = 'Unknown Manufacturer'
            verboseoutput("    Manufacturer = "+man)
            SerialNumber = instance[u'SerialNumber']
            if SerialNumber:
              verboseoutput("    SerialNumber = "+SerialNumber)
            server_info = man + ' '
            if vendor != 'intel':
              model = instance[u'Model']
              if model:
                verboseoutput("    Model = "+model)
                server_info +=  model + ' s/n:'
    
          elif elementName == 'Server Blade' :
            SerialNumber = instance[u'SerialNumber']
            if SerialNumber:
              verboseoutput("    SerialNumber = "+SerialNumber)
    
          # Report detail of Numeric Sensors and generate nagios perfdata
    
          if classe == "CIM_NumericSensor" :
            sensorType = instance[u'sensorType']
            sensStr = sensor_Type.get(sensorType,"Unknown")
            if sensorType:
              verboseoutput("    sensorType = %d - %s" % (sensorType,sensStr))
            units = instance[u'BaseUnits']
            if units:
              verboseoutput("    BaseUnits = %d" % units)
            # grab some of these values for Nagios performance data
            scale = 10**instance[u'UnitModifier']
            verboseoutput("    Scaled by = %f " % scale)
            cr = int(instance[u'CurrentReading'])*scale
            verboseoutput("    Current Reading = %f" % cr)
            elementNameValue = "%s: %g" % (elementName,cr)
            ltnc = 0
            utnc = 0
            ltc  = 0
            utc  = 0
            if instance[u'LowerThresholdNonCritical'] is not None:
              ltnc = instance[u'LowerThresholdNonCritical']*scale
              verboseoutput("    Lower Threshold Non Critical = %f" % ltnc)
            if instance[u'UpperThresholdNonCritical'] is not None:
              utnc = instance[u'UpperThresholdNonCritical']*scale
              verboseoutput("    Upper Threshold Non Critical = %f" % utnc)
            if instance[u'LowerThresholdCritical'] is not None:
              ltc = instance[u'LowerThresholdCritical']*scale
              verboseoutput("    Lower Threshold Critical = %f" % ltc)
            if instance[u'UpperThresholdCritical'] is not None:
              utc = instance[u'UpperThresholdCritical']*scale
              verboseoutput("    Upper Threshold Critical = %f" % utc)
            #
            if perfdata:
              perf_el = elementName.replace(' ','_')
    
              # Power and Current
              if sensorType == 4:               # Current or Power Consumption
                if units == 7:            # Watts
                  if get_power:
                    data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),1) )
                elif units == 6:          # Current
                  if get_current:
                    data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),3) )
    
              # PSU Voltage
              elif sensorType == 3:               # Voltage
                if get_volts:
                  data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),2) )
    
              # Temperatures
              elif sensorType == 2:               # Temperature
                if get_temp:
                  data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),4) )
    
              # Fan speeds
              elif sensorType == 5:               # Tachometer
                if get_fan:
                  if units == 65:           # percentage
                    data.append( ("%s=%g%%;%g;%g " % (perf_el, cr, utnc, utc),6) )
                  else:
                    data.append( ("%s=%g;%g;%g " % (perf_el, cr, utnc, utc),5) )
    
          elif classe == "CIM_Processor" :
            verboseoutput("    Family = %d" % instance['Family'])
            verboseoutput("    CurrentClockSpeed = %dMHz" % instance['CurrentClockSpeed'])
    
    
          # HP Check
          if vendor == "hp" :
            if instance['HealthState'] is not None :
              elementStatus = instance['HealthState']
              verboseoutput("    Element HealthState = %d" % elementStatus)
              interpretStatus = {
                0  : ExitOK,    # Unknown
                5  : ExitOK,    # OK
                10 : ExitWarning,  # Degraded
                15 : ExitWarning,  # Minor
                20 : ExitCritical,  # Major
                25 : ExitCritical,  # Critical
                30 : ExitCritical,  # Non-recoverable Error
              }[elementStatus]
              if (interpretStatus == ExitCritical) :
                verboseoutput("GLobal exit set to CRITICAL")
                GlobalStatus = ExitCritical
                ExitMsg += " CRITICAL : %s " % elementNameValue
              if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
                verboseoutput("GLobal exit set to WARNING")
                GlobalStatus = ExitWarning
                ExitMsg += " WARNING : %s " % elementNameValue
              # Added the following for when GlobalStatus is ExitCritical and a warning is detected
              # This way the ExitMsg gets added but GlobalStatus isn't changed
              if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
                ExitMsg += " WARNING : %s " % elementNameValue #ARR
              # Added the following so that GlobalStatus gets set to OK if there's no warning or critical
              if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
                GlobalStatus = ExitOK #ARR
    
    
    
          # Dell, Intel, IBM and unknown hardware check
          elif (vendor == "dell" or vendor == "intel" or vendor == "ibm" or vendor=="unknown") :
            if instance['OperationalStatus'] is not None :
              elementStatus = instance['OperationalStatus'][0]
              verboseoutput("    Element Op Status = %d" % elementStatus)
              interpretStatus = {
                0  : ExitOK,            # Unknown
                1  : ExitCritical,      # Other
                2  : ExitOK,            # OK
                3  : ExitWarning,       # Degraded
                4  : ExitWarning,       # Stressed
                5  : ExitWarning,       # Predictive Failure
                6  : ExitCritical,      # Error
                7  : ExitCritical,      # Non-Recoverable Error
                8  : ExitWarning,       # Starting
                9  : ExitWarning,       # Stopping
                10 : ExitCritical,      # Stopped
                11 : ExitOK,            # In Service
                12 : ExitWarning,       # No Contact
                13 : ExitCritical,      # Lost Communication
                14 : ExitCritical,      # Aborted
                15 : ExitOK,            # Dormant
                16 : ExitCritical,      # Supporting Entity in Error
                17 : ExitOK,            # Completed
                18 : ExitOK,            # Power Mode
                19 : ExitOK,            # DMTF Reserved
                20 : ExitOK             # Vendor Reserved
              }[elementStatus]
              if (interpretStatus == ExitCritical) :
                verboseoutput("Global exit set to CRITICAL")
                GlobalStatus = ExitCritical
                ExitMsg += " CRITICAL : %s " % elementNameValue
              if (interpretStatus == ExitWarning and GlobalStatus != ExitCritical) :
                verboseoutput("GLobal exit set to WARNING")
                GlobalStatus = ExitWarning
                ExitMsg += " WARNING : %s " % elementNameValue
              # Added same logic as in 20100702 here, otherwise Dell servers would return UNKNOWN instead of OK
              if (interpretStatus == ExitWarning and GlobalStatus == ExitCritical) : # ARR
                ExitMsg += " WARNING : %s " % elementNameValue #ARR
              if (interpretStatus == ExitOK and GlobalStatus != ExitWarning and GlobalStatus != ExitCritical) : #ARR
                GlobalStatus = ExitOK #ARR
            if elementName == 'Server Blade' :
                    if SerialNumber :
                            if SerialNumber.find(".") != -1 :
                                    SerialNumber = SerialNumber.split('.')[1]
    
    
    # Munge the ouptput to give links to documentation and warranty info
    if (urlise_country != '') :
      SerialNumber = urlised_serialnumber(vendor,urlise_country,SerialNumber)
      server_info = urlised_server_info(vendor,urlise_country,server_info)
    
    # Output performance data
    perf = '|'
    if perfdata:
      sdata=[]
      ctr=[0,0,0,0,0,0,0]
      # sort the data so we always get perfdata in the right order
      # we make no assumptions about the order in which CIM returns data
      # first sort by element name (effectively) and insert sequence numbers
      for p in sorted(data):
        p1 = p[1]
        sdata.append( ("P%d%s_%d_%s") % (p1,perf_Prefix[p1], ctr[p1], p[0]) )
        ctr[p1] += 1
      # then sort perfdata into groups and output perfdata string
      for p in sorted(sdata):
        perf += p
    
    # sanitise perfdata - don't output "|" if nothing to report
    if perf == '|':
      perf = ''
    
    if GlobalStatus == ExitOK :
      print "OK - Server: %s %s %s%s" % (server_info, SerialNumber, bios_info, perf)
    
    elif GlobalStatus == ExitUnknown :
      print "UNKNOWN: %s" % (ExitMsg) #ARR
    
    else:
      print "%s- Server: %s %s %s%s" % (ExitMsg, server_info, SerialNumber, bios_info, perf)
    
    sys.exit (GlobalStatus)
    Now test the script to make sure it works.

    Code:
    
    /usr/local/nagios/libexec/check_esxi_hardware.py -H 192.168.107.44 -U esxiuser -P esxipassword -V ibm
    
    The final step is to verify that nothing is broken in the configuration:
    Code:
    
    /etc/nagios/verify.sh
    
    If there were no errors or warnings, restart Nagios to load the new configuration:
    Code:
    
    /etc/init.d/nagios stop
    /etc/init.d/nagios start
    

  9. #19
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Monitoring MySQL Server

    The script will be executed on the remote Linux server so we will be making use of NRPE.

    On the remote MySQL server, install the Nagios plugins, NRPE server and NRPE plugin as mentioned earlier for remote Linux servers.

    An extra step to allow the check_mysql plugin to work is to grant the nagios user access to a database. Rather than granting access to an existing database, let's create an empty database just for Nagios.

    Type the following commands to create a nagios database, nagios user and read-only access to just the empty Nagios database:

    Code:
    
    mysql
    CREATE DATABASE nagiosdb;
    CREATE USER nagios;
    SET PASSWORD FOR 'nagios'@'%'=PASSWORD('nagios-password');
    GRANT SELECT ON nagiosdb.* TO nagios IDENTIFIED BY 'nagios-password';
    FLUSH PRIVILEGES;
    exit
    
    Now see if the command will run on your server (before trying to test them remotely on the Nagios server)

    Code:
    
    /usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagios -p nagios-password
    
    Add the plugin to the trusted NRPE commands to be executed. Edit /etc/nagios/nrpe_local.cfg

    Code:
    
    command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -w 20 -c 10 -d nagiosdb -u nagios -p nagios-password
    
    Even though we are using a low-acces and read-only ID, the password is exposed in the config file so make sure the file ownership and permissions are set accordingly:
    Code:
    
    chown root:nagios /etc/nagios/nrpe_local.cfg
    chmod 0640 /etc/nagios/nrpe_local.cfg
    
    The NRPE Server now needs to reload the configuration for the changes to take affect.
    Code:
    /etc/init.d/nagios-nrpe-server reload
    On the Nagios server, add the following command to the remote MySQL Linux server's configuration file:

    /etc/nagios/servers/srv-mysql.cfg
    Code:
    
    define service{
            use                             generic-service
            host_name                       srv-mysql
            service_description             Server Health
            check_command                   check_mysql
            }
    
    The final step is to verify that nothing is broken in the configuration:
    Code:
    
    /etc/nagios/verify.sh
    
    If there were no errors or warnings, restart Nagios to load the new configuration:
    Code:
    
    /etc/init.d/nagios stop
    /etc/init.d/nagios start
    
    Last edited by LHammonds; May 28th, 2012 at 04:24 PM.

  10. #20
    Join Date
    Sep 2011
    Location
    Behind you!
    Beans
    1,690
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: My Notes for Installing Nagios on Ubuntu Server 12.04 LTS

    Monitoring Remote Windows Servers

    Monitoring Windows Servers and Workstations will requiring installing a service if you need data better than a simple ping.

    For this, we will be using NSClient++. In particular, we will be downloading the Win32 and x64 "zip" files for version 0.3.9.

    The reason why I chose ZIP files instead of the MSI files is that it is much more simple to configure and rollout.

    Extract the Win32 ZIP file to C:\NSClient\ and edit C:\NSClient\nsc.ini

    Uncomment the DLL files you will be using between lines 10 and 22. For example:
    Code:
    FileLogger.dll
    CheckSystem.dll
    CheckDisk.dll
    NSClientListener.dll
    NRPEListener.dll
    SysTray.dll
    CheckEventLog.dll
    CheckHelpers.dll
    ;CheckWMI.dll
    CheckNSCP.dll
    
    ; Script to check external scripts and/or internal aliases.
    CheckExternalScripts.dll
    On line 56, set the password that will be required to access the remote functions. For example:

    Code:
    password=my-nsclient-password
    On the Nagios server, you will need to match this password in your resource file which will then be referenced in your server config file.
    /etc/nagios/resources.cfg
    Code:
    $USER5$=my-nsclient-password
    On line 62, set the IP of the Nagios server to limit access to just that host. For example:

    Code:
    allowed_hosts=192.168.107.21
    On line 67, tell it to use this file to obtain settings rather than the registry.

    Code:
    use_file=1
    On line 100, set the IP of the Nagios server to limit access to just that host. For example:

    Code:
    allowed_hosts=192.168.107.21
    On line 104, set the port number that will be used for communication with Nagios via check_nt. It would be wise to use a port other than the default. This example is using the default port:

    Code:
    port=12489
    On line 118, set the port number that will be used for communication with Nagios via check_nrpe. It would be wise to use a port other than the default. This example is using the default port:

    Code:
    port=5666
    On line 134, enable SSL. For example:

    Code:
    use_ssl=1
    On line 144, set the IP of the Nagios server to limit access to just that host. For example:

    Code:
    allowed_hosts=192.168.107.21
    On line 244, enable the check for Windows Update script. For example:

    Code:
    check_updates=check_updates.vbs
    Now, to make rolling this out a snap, create a couple of batch files to install / remove the NSClient service:

    C:\NSClient\service-install.bat
    Code:
    
    @ECHO OFF
    NSClient++.exe -install
    START NET START NSClientpp /WAIT
    pause
    
    C:\NSClient\service-uninstall.bat
    Code:
    
    @ECHO OFF
    START NET STOP NSClientpp /WAIT
    NSClient++.exe -uninstall
    pause
    
    Copy the C:\NSClient folder to a network share and then go to each Windows host you want to monitor and copy the folder to C:\NSClient and double-click the "Service-Install.bat" file.

    You will also need to add rules to your firewall to allow communication from the Nagios server.

    Inbound Rule Name: Nagios 12489 TCP
    - Check: Enabled
    - Action: Allow the connection
    - Protocol Type: TCP
    - Local Port: 12489
    - Remote Port: All Ports
    - Profile: Domain
    - Local IP address: Any IP address
    - Remote IP address: These IP addresses: 192.168.107.21

    Inbound Rule Name: Nagios 5666 TCP
    - Check: Enabled
    - Action: Allow the connection
    - Protocol Type: TCP
    - Local Port: 5666
    - Remote Port: All Ports
    - Profile: Domain
    - Local IP address: Any IP address
    - Remote IP address: These IP addresses: 192.168.107.21

    On the Nagios server, create or copy a Windows config file and make appropriate changes such as server name and IP. See the Sample Windows config file posted earlier in the thread.

    The final step is to verify that nothing is broken in the configuration:
    Code:
    
    /etc/nagios/verify.sh
    
    If there were no errors or warnings, restart Nagios to load the new configuration:
    Code:
    
    /etc/init.d/nagios stop
    /etc/init.d/nagios start
    
    Rinse, lather repeat for the x64 version if you have 64-bit servers.

    NOTE: The Win32 version will work on 64-bit servers. The only problem is if you need to check for the existence of running processes such as Explorer.exe or Notepad.exe which are 64-bit. The Win32 client cannot properly detect 64-bit programs.
    Last edited by LHammonds; May 30th, 2012 at 07:03 PM.

Page 2 of 6 FirstFirst 1234 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •