====== Netsaint_/_Nagios ====== ====== Nagios is a Network Monitoring Service :: Setup and Install ====== It can monitor several services on several hosts and notify by email etc. a certain group depending on the levels of measurement. To keep it simple: apt-get install nagios-text <-> sarge and etch config apt-get install nagios3 FYI: In Debian squeeze, nagios requires php to be installed for the front end :-/ On the install process, a password for the default admin user is required. nagiosadmin password_chosen_at_install (This is for the Web Interface) additional users: /etc/nagios3/htpasswd.users (add via apache htpasswd) There is a ton of configuring to be done. First off - apache2 site-enabled. ln -s /etc/nagios3/apache.conf /etc/apache2/sites-enabled/nagios (restart apache) This will get the basics done at http://localhost/nagios. You will be able to login. The Default Gateway should get added in by default and will be monitored ok. Copy the settings in /etc/nagios and put in another host etc... Great Explaination at: http://www.debian-administration.org/articles/299 ====== Configuration of Nagios ====== There is quite a bit of configuration required for Nagios. If the following steps are carried out in order, things should be a lot easier. Although by default the "Default Gateway" (gw) is added in with its own group etc. it was put into a new hostgroup with updated contact details. ===== Overview of Nagios Config Files and Plugins ===== The main nagios config files are kept in: /etc/nagios/ /etc/nagios3/ The plugin config files are kept in: /etc/nagios-plugins/config/ The executable plugins are kept in: /usr/lib/nagios/plugins/ ===== 0. Additional Info Available ===== Please read http://nagios.sourceforge.net/docs/2_0/xodtemplate.html#host for all details relating to the options/files below and their template. E.g. the following host config options are explained there: d,u,r. d=down. u=unreachable. r=recovered (note: there are more options available). Extended example configs are located at: /usr/share/doc/nagios-text/examples/template-object/ ** All Configs for Nagios3 go into /etc/nagios3/conf.d/* I moved the existing files from /etc/nagios3/conf.d/* and added in the ones below. You can choose to edit and merge the configs below into the existing files if you wish. ** ===== 1. Config all unique hosts ===== Note: Only specify different physical servers (ip's). Multiple http websites can be monitored on 1 host. vi /etc/nagios/hosts.cfg /etc/nagios3/conf.d/hosts.cfg define host{ name generic-host ; The name of this host template.... notifications_enabled 1 ; Host notifications are enabled event_handler_enabled 1 ; Host event handler is disabled flap_detection_enabled 0 ; Flap detection is disabled. Flap = prevents against intermittent network anomalies process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts. Turn this off (0) when testing and doing lots of restarts, otherwise some settings will be cached! retain_nonstatus_information 1 ; Retain non-status information across program restarts. This can be turned off also while testing and setting up. register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE! } # Default gateway host definition define host{ use generic-host ; Name of host template to use host_name gateway alias Default Gateway address ip.address.or.domain.com.name check_command check-host-alive max_check_attempts 20 notification_interval 60 notification_period 24x7 notification_options d,u,r } define host{ use generic-host ; Name of host template to use host_name domain1.com alias Domain 1 address ip.or.host.name check_command check-host-alive max_check_attempts 20 notification_interval 120 notification_period 24x7 notification_options d,u,r } define host{ use generic-host ; Name of host template to use host_name domain2.com alias Domain 2 address ip.address.or.host.name check_command check-host-alive max_check_attempts 20 notification_interval 120 notification_period 24x7 notification_options d,u,r } define host{ use generic-host ; Name of host template to use host_name www.google.com alias Google Webserver address www.google.com check_command check-host-alive max_check_attempts 20 notification_interval 120 notification_period 24x7 notification_options d,u,r } ==== Disable Checking of a Host ==== I have been having problems with 1 host in particular, where nagios gets tied up checking TTL and does not wait between TTL checks. The errors were: [[06-24-2007|11:10:13]] HOST ALERT: host.com;DOWN;SOFT;19;CRITICAL - Time to live exceeded (82.195.144.16) [[06-24-2007|11:10:13]] HOST ALERT: host.com;DOWN;SOFT;18;CRITICAL - Time to live exceeded (82.195.144.16) [[06-24-2007|11:10:13]] HOST ALERT: host.com;DOWN;SOFT;17;CRITICAL - Time to live exceeded (82.195.144.16) [[06-24-2007|11:10:13]] HOST ALERT: host.com;DOWN;SOFT;16;CRITICAL - Time to live exceeded (82.195.144.16) #and so on for 20 checks with no wait The same error has been discussed and described further here: http://readlist.com/lists/lists.sourceforge.net/nagios-users/0/2181.html Instead of putting in some code to get nagios waiting between TTL checks, I simply chose to disable host checking, and to check just the service on that server instead. To disable checking of a host, add the following to the define host{ } code (as above): define host{ use generic-host ; Name of host template to use host_name www.google.com alias Google Webserver address www.google.com check_command check-host-alive max_check_attempts 20 checks_enabled 0 notification_interval 120 notification_period 24x7 notification_options d,u,r } ===== 2. Config Nagios hostgroups ===== Hostgroups quite simply group together all the hosts in hosts.cfg. They are mainly used to order and group services and hosts together. I created seperate hostgroups for various server clusters. I.e. 1 hostgroup for my own server cluster, and a second for my computer society servers, and a third for Commerical Hosting webservers. vi /etc/nagios3/conf.d/hostgroups.cfg define hostgroup{ hostgroup_name my_cluster alias My Server Cluster contact_groups root-my_cluster members gateway, domain1.com, domain2.com } define hostgroup{ hostgroup_name other-webservers alias Other Commercial Web Servers contact_groups select-users-my_cluster members www.google.com } ===== 3. Config Nagios Contacts ===== Note: As with hosts, the contacts config takes in specific names of people and their contact information. Various contacts are then grouped together in step 4. For this config, I am going to have 2 main contacts. 1 is going to be the root administrator and the second is going to be a general user (for recieving information on the non essential other-webservers). Again, look at http://nagios.sourceforge.net/docs/2_0/xodtemplate.html#contact for specifics on notification options. vi /etc/nagios3/conf.d/contacts.cfg define contact{ contact_name root alias Root Administrator service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email root@domain.com } define contact{ contact_name sburke alias A Standard/Typical User service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email username@domain.com } ===== 4. Config Nagios Contactgroups ===== Again, all the various contacts as outlined in step 3 needs to be grouped together. The hostgroups.cfg and services.cfg send alert notifications to "contactgroups" and not individual contacts. Although all these seperate configs seem to be very awkward, they ensure that users and hosts and services can be added easily. vi /etc/nagios3/conf.d/contactgroups.cfg define contactgroup{ contactgroup_name root-my_cluster alias Root Admins on My Cluster members root } define contactgroup{ contactgroup_name select-users-my_cluster alias Users on Burkesys members sburke } Note: "root-my_cluster", "root", "select-users-my_cluster" and "sburke" were selected from Steps 2 and 3. ===== 5. Config Nagios Services ===== This is the main and final configuration file (typically). All information in the previous 4 steps must be used and matched up correctly with the configs and information in this step, otherwise nagios will complain and give a helpful debug. vi /etc/nagios3/conf.d/services.cfg # Generic service definition template define service{ ; The 'name' of this service template, referenced in other service definitions name generic-service active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/disabled parallelize_check 1 ; Active service checks should be parallelized ; (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are disabled event_handler_enabled 1 ; Service event handler is disabled flap_detection_enabled 0 ; Flap detection is disabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts Turn this off (0) when testing and doing lots of restarts, otherwise some settings will be cached! register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } # Service definition define service{ use generic-service ; Name of service template to use host_name domain1.com, domain2.com service_description PING is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notifications_enabled 1 notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command check_ping!100.0,20%!500.0,60% ;check_ping syntax: !warning if exceeds 100ms,warning if exceeds 20% packet loss!critical if exceeds 500ms,critical if exceeds 60% packet loss } define service{ use generic-service host_name domain1.com service_description HTTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_http } define service{ use generic-service host_name domain1.com service_description HTTP-vhost_name is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_http_url!http://vhost.domain1.com/path/to/application/page.php ;please read Step 6 below for extra config required. } define service{ use generic-service host_name domain2.com service_description DNS is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_dns } define service{ use generic-service host_name domain2.com service_description MySQL is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_mysql_cmdlinecred!mysqluser!mysqlpassword } define service{ use generic-service host_name domain2.com service_description SMTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups root-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_smtp } ################################################################ define service{ use generic-service host_name www.google.com service_description PING is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups select-users-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_ping!100.0,20%!500.0,60% } define service{ use generic-service host_name www.google.com service_description HTTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups select-users-my_cluster notification_interval 120 notification_period 24x7 notification_options c,r check_command check_http } The services.cfg can get quite long indeed! Services can be grouped together in servicegroups.cfg, however I didnt bother with this step. It provides a better overview using the Web Front end when there are a large number of services. ===== 6. Extra Custom Plugin Configs ===== In the services.cfg, there is an "check_http_url" config added in. Currently nagios would give an error at this step. That is because "check_http_url" is a special config to monitor a vhost.domain1.com and prevents us from having to make a host for a virtual website to monitor. vi /etc/nagios-plugins/config/http.cfg # 'check_http3' command definition define command{ command_name check_http_url command_line /usr/lib/nagios/plugins/check_http -I $HOSTADDRESS$ -u $ARG1$ } In order to see what options are available and the command line switches etc. do the following: /usr/lib/nagios/plugins/check_http --help There are several options for all of the plugins within /usr/lib/nagios/plugins/ to monitor various specific levels of performance. **Another config is /etc/nagios/escalations.cfg** however at the moment I feel it works ok without this step. I will revisit it at a later stage. ====== Send Nagios Notifications via SMS Text Messages ====== Although a simple config could be made for nagios to send sms's via vodasms (o2sms), I chose to do the sms handling at email delivery time using procmail. Read more here: [[Vodasms#Forward_Emails_via_SMS_Text_Message]] ====== References & Additional Info ====== Vhost & Website Monitoring: http://theories.darwinsys.com/2007/04/05/1175779980000.html
Monitoring tomcat website: http://nagios.org/faqs/viewfaq.php?faq_id=310
http://www.kernel-panic.it/openbsd/nagios/nagios3.html
Main Nagios Templates and Docs: http://nagios.sourceforge.net/docs/2_0/xodtemplate.html
General: http://www.onlamp.com/pub/a/onlamp/2002/09/26/nagios.html?page=1
General and Good: http://www.debian-administration.org/articles/299
General with some mistakes: http://servers.linux.com/servers/04/09/14/2317206.shtml
MySQL info and Nagios: http://www.gatorlug.org/files/GatorLUG.ppt ====== Monitor HTML via a Proxy ====== If nagios is running on a server which its firewall blocks outgoing http(s) requests, then you will have to use a proxy (if available) to check http on a remote host/server. Here is the configs and tweaks required: vi /etc/nagios-plugins/config/http.cfg # 'check_http_via_proxy define command{ command_name check_http_via_proxy command_line /usr/lib/nagios/plugins/check_http -H $ARG1$ -p $ARG2$ -u $ARG3$ -e 'HTTP/1.0 200 OK' } vi /etc/nagios/services.cfg # edit the check_command for the particular service you require to: host_name externalserver.com check_command check_http_via_proxy!proxy.internalserver.com!3128!http://externalserver.com # note - sometimes the squid proxy would only serve a cached page. To get around this, the check_command was further tweaked to call a particlar webpage, i.e.: check_command check_http_via_proxy!proxy.internalserver.com!3128!http://externalserver.com/~userwebsite/ Hopefully that should work ok.