Netsaint / Nagios

From Wiki

Jump to: navigation, search

Contents

Nagios is a Network Monitoring Service :: Setup and Install

It can monitor several services on several hosts and notify by email etc. a certain group depending on the levels of measurement. To keep it simple:

apt-get install nagios-text <-> sarge and etch config
apt-get install nagios3

FYI: In Debian squeeze, nagios requires php to be installed for the front end :-/

On the install process, a password for the default admin user is required.

nagiosadmin
password_chosen_at_install (This is for the Web Interface)

additional users: /etc/nagios3/htpasswd.users (add via apache htpasswd)

There is a ton of configuring to be done. First off - apache2 site-enabled.

ln -s /etc/nagios3/apache.conf /etc/apache2/sites-enabled/nagios
(restart apache)

This will get the basics done at http://localhost/nagios. You will be able to login. The Default Gateway should get added in by default and will be monitored ok. Copy the settings in /etc/nagios and put in another host etc...

Great Explaination at: http://www.debian-administration.org/articles/299

Configuration of Nagios

There is quite a bit of configuration required for Nagios. If the following steps are carried out in order, things should be a lot easier. Although by default the "Default Gateway" (gw) is added in with its own group etc. it was put into a new hostgroup with updated contact details.

Overview of Nagios Config Files and Plugins

The main nagios config files are kept in: /etc/nagios/ /etc/nagios3/ The plugin config files are kept in: /etc/nagios-plugins/config/ The executable plugins are kept in: /usr/lib/nagios/plugins/

0. Additional Info Available

Please read http://nagios.sourceforge.net/docs/2_0/xodtemplate.html#host for all details relating to the options/files below and their template. E.g. the following host config options are explained there: d,u,r. d=down. u=unreachable. r=recovered (note: there are more options available). Extended example configs are located at: /usr/share/doc/nagios-text/examples/template-object/

All Configs for Nagios3 go into /etc/nagios3/conf.d/* I moved the existing files from /etc/nagios3/conf.d/* and added in the ones below. You can choose to edit and merge the configs below into the existing files if you wish.

1. Config all unique hosts

Note: Only specify different physical servers (ip's). Multiple http websites can be monitored on 1 host.

vi /etc/nagios/hosts.cfg /etc/nagios3/conf.d/hosts.cfg

define host{
       name                            generic-host    ; The name of this host template....
       notifications_enabled           1       ; Host notifications are enabled
       event_handler_enabled           1       ; Host event handler is disabled
       flap_detection_enabled          0       ; Flap detection is disabled. Flap = prevents against intermittent network anomalies
       process_perf_data               1       ; Process performance data
       retain_status_information       1       ; Retain status information across program restarts. Turn this off (0) when testing and doing lots of restarts, otherwise some settings will be cached!
       retain_nonstatus_information    1       ; Retain non-status information across program restarts. This can be turned off also while testing and setting up.
       register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
       }

# Default gateway host definition
define host{
       use                     generic-host            ; Name of host template to use
       host_name               gateway
       alias                   Default Gateway
       address                 ip.address.or.domain.com.name
       check_command           check-host-alive
       max_check_attempts      20
       notification_interval   60
       notification_period     24x7
       notification_options    d,u,r
}

define host{
       use                     generic-host            ; Name of host template to use
       host_name               domain1.com
       alias                   Domain 1
       address                 ip.or.host.name
       check_command           check-host-alive
       max_check_attempts      20
       notification_interval   120
       notification_period     24x7
       notification_options    d,u,r
}

define host{
       use                     generic-host            ; Name of host template to use
       host_name               domain2.com
       alias                   Domain 2
       address                 ip.address.or.host.name
       check_command           check-host-alive
       max_check_attempts      20
       notification_interval   120
       notification_period     24x7
       notification_options    d,u,r
}

define host{
       use                     generic-host            ; Name of host template to use
       host_name               www.google.com
       alias                   Google Webserver
       address                 www.google.com
       check_command           check-host-alive
       max_check_attempts      20
       notification_interval   120
       notification_period     24x7
       notification_options    d,u,r
}

Disable Checking of a Host

I have been having problems with 1 host in particular, where nagios gets tied up checking TTL and does not wait between TTL checks. The errors were:

[06-24-2007 11:10:13] HOST ALERT: host.com;DOWN;SOFT;19;CRITICAL - Time to live exceeded (82.195.144.16)
[06-24-2007 11:10:13] HOST ALERT: host.com;DOWN;SOFT;18;CRITICAL - Time to live exceeded (82.195.144.16)
[06-24-2007 11:10:13] HOST ALERT: host.com;DOWN;SOFT;17;CRITICAL - Time to live exceeded (82.195.144.16)
[06-24-2007 11:10:13] HOST ALERT: host.com;DOWN;SOFT;16;CRITICAL - Time to live exceeded (82.195.144.16)
#and so on for 20 checks with no wait

The same error has been discussed and described further here: http://readlist.com/lists/lists.sourceforge.net/nagios-users/0/2181.html Instead of putting in some code to get nagios waiting between TTL checks, I simply chose to disable host checking, and to check just the service on that server instead. To disable checking of a host, add the following to the define host{ } code (as above):

define host{
       use                     generic-host            ; Name of host template to use
       host_name               www.google.com
       alias                   Google Webserver
       address                 www.google.com
       check_command           check-host-alive
       max_check_attempts      20
       checks_enabled          0
       notification_interval   120
       notification_period     24x7
       notification_options    d,u,r
}

2. Config Nagios hostgroups

Hostgroups quite simply group together all the hosts in hosts.cfg. They are mainly used to order and group services and hosts together. I created seperate hostgroups for various server clusters. I.e. 1 hostgroup for my own server cluster, and a second for my computer society servers, and a third for Commerical Hosting webservers.

vi /etc/nagios3/conf.d/hostgroups.cfg
define hostgroup{
       hostgroup_name  my_cluster
       alias           My Server Cluster
       contact_groups  root-my_cluster
       members         gateway, domain1.com, domain2.com
}

define hostgroup{
       hostgroup_name  other-webservers
       alias           Other Commercial Web Servers
       contact_groups  select-users-my_cluster
       members         www.google.com
}

3. Config Nagios Contacts

Note: As with hosts, the contacts config takes in specific names of people and their contact information. Various contacts are then grouped together in step 4. For this config, I am going to have 2 main contacts. 1 is going to be the root administrator and the second is going to be a general user (for recieving information on the non essential other-webservers). Again, look at http://nagios.sourceforge.net/docs/2_0/xodtemplate.html#contact for specifics on notification options.

vi /etc/nagios3/conf.d/contacts.cfg
define contact{
       contact_name                    root
       alias                           Root Administrator
       service_notification_period     24x7
       host_notification_period        24x7
       service_notification_options    w,u,c,r
       host_notification_options       d,u,r
       service_notification_commands   notify-by-email
       host_notification_commands      host-notify-by-email
       email                           root@domain.com
}

define contact{
       contact_name                    sburke
       alias                           A Standard/Typical User
       service_notification_period     24x7
       host_notification_period        24x7
       service_notification_options    w,u,c,r
       host_notification_options       d,u,r
       service_notification_commands   notify-by-email
       host_notification_commands      host-notify-by-email
       email                           username@domain.com
}

4. Config Nagios Contactgroups

Again, all the various contacts as outlined in step 3 needs to be grouped together. The hostgroups.cfg and services.cfg send alert notifications to "contactgroups" and not individual contacts. Although all these seperate configs seem to be very awkward, they ensure that users and hosts and services can be added easily.

vi /etc/nagios3/conf.d/contactgroups.cfg
define contactgroup{
       contactgroup_name       root-my_cluster
       alias                   Root Admins on My Cluster
       members                 root
}

define contactgroup{
       contactgroup_name       select-users-my_cluster
       alias                   Users on Burkesys
       members                 sburke
}

Note: "root-my_cluster", "root", "select-users-my_cluster" and "sburke" were selected from Steps 2 and 3.

5. Config Nagios Services

This is the main and final configuration file (typically). All information in the previous 4 steps must be used and matched up correctly with the configs and information in this step, otherwise nagios will complain and give a helpful debug.

vi /etc/nagios3/conf.d/services.cfg
# Generic service definition template
define service{
       ; The 'name' of this service template, referenced in other service definitions
       name                            generic-service
       active_checks_enabled           1       ; Active service checks are enabled
       passive_checks_enabled          1       ; Passive service checks are enabled/disabled
       parallelize_check               1       ; Active service checks should be parallelized
                                               ; (disabling this can lead to major performance problems)
       obsess_over_service             1       ; We should obsess over this service (if necessary)
       check_freshness                 0       ; Default is to NOT check service 'freshness'
       notifications_enabled           1       ; Service notifications are disabled
       event_handler_enabled           1       ; Service event handler is disabled
       flap_detection_enabled          0       ; Flap detection is disabled
       process_perf_data               1       ; Process performance data
       retain_status_information       1       ; Retain status information across program restarts
       retain_nonstatus_information    1       ; Retain non-status information across program restarts Turn this off (0) when testing and doing lots of restarts, otherwise some settings will be cached!
       register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

# Service definition
define service{
       use                             generic-service         ; Name of service template to use
       host_name                       domain1.com, domain2.com
       service_description             PING
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notifications_enabled           1
       notification_interval           120
       notification_period             24x7
       notification_options            w,u,c,r
       check_command                   check_ping!100.0,20%!500.0,60%
       ;check_ping syntax: !warning if exceeds 100ms,warning if exceeds 20% packet loss!critical if exceeds 500ms,critical if exceeds 60% packet loss
}

define service{
       use                             generic-service
       host_name                       domain1.com
       service_description             HTTP
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_http
}

define service{
       use                             generic-service
       host_name                       domain1.com
       service_description             HTTP-vhost_name
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_http_url!http://vhost.domain1.com/path/to/application/page.php   ;please read Step 6 below for extra config required.
}

define service{
       use                             generic-service
       host_name                       domain2.com
       service_description             DNS
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_dns
}

define service{
       use                             generic-service
       host_name                       domain2.com
       service_description             MySQL
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_mysql_cmdlinecred!mysqluser!mysqlpassword
}

define service{
       use                             generic-service
       host_name                       domain2.com
       service_description             SMTP
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  root-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_smtp
}
################################################################
define service{
       use                             generic-service
       host_name                       www.google.com
       service_description             PING
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  select-users-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_ping!100.0,20%!500.0,60%
}

define service{
       use                             generic-service
       host_name                       www.google.com
       service_description             HTTP
       is_volatile                     0
       check_period                    24x7
       max_check_attempts              3
       normal_check_interval           5
       retry_check_interval            1
       contact_groups                  select-users-my_cluster
       notification_interval           120
       notification_period             24x7
       notification_options            c,r
       check_command                   check_http
}

The services.cfg can get quite long indeed! Services can be grouped together in servicegroups.cfg, however I didnt bother with this step. It provides a better overview using the Web Front end when there are a large number of services.

6. Extra Custom Plugin Configs

In the services.cfg, there is an "check_http_url" config added in. Currently nagios would give an error at this step. That is because "check_http_url" is a special config to monitor a vhost.domain1.com and prevents us from having to make a host for a virtual website to monitor.

vi /etc/nagios-plugins/config/http.cfg
# 'check_http3' command definition
define command{
       command_name    check_http_url
       command_line    /usr/lib/nagios/plugins/check_http -I $HOSTADDRESS$ -u $ARG1$
}

In order to see what options are available and the command line switches etc. do the following:

/usr/lib/nagios/plugins/check_http --help

There are several options for all of the plugins within /usr/lib/nagios/plugins/ to monitor various specific levels of performance.


Another config is /etc/nagios/escalations.cfg however at the moment I feel it works ok without this step. I will revisit it at a later stage.

Send Nagios Notifications via SMS Text Messages

Although a simple config could be made for nagios to send sms's via vodasms (o2sms), I chose to do the sms handling at email delivery time using procmail. Read more here: Vodasms#Forward_Emails_via_SMS_Text_Message

References & Additional Info

Vhost & Website Monitoring: http://theories.darwinsys.com/2007/04/05/1175779980000.html
Monitoring tomcat website: http://nagios.org/faqs/viewfaq.php?faq_id=310
http://www.kernel-panic.it/openbsd/nagios/nagios3.html
Main Nagios Templates and Docs: http://nagios.sourceforge.net/docs/2_0/xodtemplate.html
General: http://www.onlamp.com/pub/a/onlamp/2002/09/26/nagios.html?page=1
General and Good: http://www.debian-administration.org/articles/299
General with some mistakes: http://servers.linux.com/servers/04/09/14/2317206.shtml
MySQL info and Nagios: http://www.gatorlug.org/files/GatorLUG.ppt

Monitor HTML via a Proxy

If nagios is running on a server which its firewall blocks outgoing http(s) requests, then you will have to use a proxy (if available) to check http on a remote host/server. Here is the configs and tweaks required: vi /etc/nagios-plugins/config/http.cfg

# 'check_http_via_proxy
define command{
       command_name    check_http_via_proxy
       command_line    /usr/lib/nagios/plugins/check_http -H $ARG1$ -p $ARG2$ -u $ARG3$ -e 'HTTP/1.0 200 OK'

}

vi /etc/nagios/services.cfg
# edit the check_command for the particular service you require to:
host_name                       externalserver.com
check_command                   check_http_via_proxy!proxy.internalserver.com!3128!http://externalserver.com
# note - sometimes the squid proxy would only serve a cached page. To get around this, the check_command was further tweaked to call a particlar webpage, i.e.:
check_command                   check_http_via_proxy!proxy.internalserver.com!3128!http://externalserver.com/~userwebsite/

Hopefully that should work ok.

Personal tools