Sync with new alert declaration and add explanations with code usage - reed-alert - Lightweight agentless alerting system for server HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ DIR Log DIR Files DIR Refs DIR Tags DIR README DIR LICENSE --- DIR commit 439acf53f4c8be2665c3459055a57b3d03656fd8 DIR parent 8e2203d405f186f1c5e6968d37e45482a7175399 HTML Author: Solene Rapenne <solene@perso.pw> Date: Wed, 10 Jan 2018 20:17:32 +0100 Sync with new alert declaration and add explanations with code usage Diffstat: M README | 147 ++++++++++++++++++++++++------- 1 file changed, 115 insertions(+), 32 deletions(-) --- DIR diff --git a/README b/README @@ -20,11 +20,16 @@ tested with both **sbcl** and **ecl** - which should be available for most distributions. (On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed' -where the binary is.) +on the partition where the binary is.) To make reed-alert's deployment easier I avoid using external libraries. reed-alert only requires a Common LISP interpreter and a -few files. +its own files. + +A development to use quicklisp libraries to write more sophisticated +checks like "does this url contains a pattern ?" had begun and had +been abandoned, it has been decided to write shell command in the +probe **command** if the user need more elaborated checks. Code-Readability @@ -34,7 +39,7 @@ Although the code is very rough for now, I think it's already fairly understandable by people who do need this kind of tool. I will try to improve on the readability of the config file in future -commits. +commits. NOTE : declaration of notifiers is easier now. Usage @@ -58,52 +63,53 @@ The configuration is explained below. The Notification System ======================= -+ function : the name of the probe -+ date : the current date with format YYYY/MM/DD hh:mm:ss -+ params : the parameters of the probe -+ hostname : the hostname of the server -+ result : the error returned (the value exceeding the limit, file not found) -+ description : an arbitrary description naming a check -+ level : the type of notification used -+ os : the type of operating system (FreeBSD/Linux/OpenBSD) -+ _ : a space character -+ space : a space character -+ newline : a newline character +When a check return an error, a previously defined notifier will be +called. The notifier is a shell command with a name. The shell command +can contains variables from reed-alert. + ++ %function% : the name of the probe ++ %date% : the current date with format YYYY/MM/DD hh:mm:ss ++ %params% : the parameters of the probe ++ %hostname% : the hostname of the server ++ %result% : the error returned (the value exceeding the limit, file not found) ++ %description% : an arbitrary description naming a check ++ %level% : the type of notification used ++ %os% : the type of operating system (FreeBSD/Linux/OpenBSD) ++ %newline% : a newline character -Example Probe: 'Check For Load Average' +Example Probe 1: 'Check For Load Average' --------------------------------------- If you want to send a mail with a message like: - "At 2016/10/06 11:11:12 server.foo.com has encountered a problem + "On 2016/10/06 11:11:12 server.foo.com has encountered a problem during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30" -write the following and use **pretty-mail** in your checks: +write the following at the top of the file and use **pretty-mail** in your checks: - (defvar *alerts* - (list - '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function - params " with a value of " result "' | mail yourmail@foo.bar")))) + (alert pretty-mail "echo 'On %date% %hostname% has encountered a problem during %function% + %params% with a value of %result%' | mail yourmail@foo.bar") -Variant 1 -~~~~~~~~~ -If you find it easier to read, you can add + in the concatenation. -The + is discarded by reed-alert as soon as it parses the list. +Example Probe 2: 'Don't do anything' +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If you don't want anything to be done when an error occur, use the following : - '(pretty-mail (date + " " + hostname + " has encountered a problem " + function)) + (alert nothing-to-send "") -Variant 2 -~~~~~~~~~ -If you don't want anything to be triggered use the following in *alerts*: +Example Probe 3: 'Send SMS' +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +You may want to use an external service to send a SMS, this is totally +possible as we rely on a shell command : - '(nothing-to-send nil) + (alert sms "echo 'error on %hostname : %function% %result%' + | curl -u login:pass http://api.sendsms.com/") The Probes ========== -Probes are written in Common LISP. +Probes are written in Common LISP. They are predefined checks. The :desc Parameter ------------------- @@ -230,6 +236,7 @@ Example : `(=> alert ping (:host "8.8.8.8"))` command ------- Execute an arbitrary command which triggers an alert if it returns a non-zero value. +This may be the most useful probe because it let the user do any check needed. > Command to execute, accept commands with pipes. :command "STRING" @@ -255,4 +262,80 @@ Check if a file has a size less than a specified limit. > Set the limit in bytes before triggering an alert. :limit INTEGER -Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))` +Example : `(=> alert file-less-than (:path "/var/log/nginx.log" :limit 60))` + + +The configuration file +====================== + +The configuration file is Common LISP code, so it's evaluated. It's +possible to write some logic within it. + + +Loops +----- +It's possible to write loops if you don't want to repeat code + + (loop for host in '("bitreich.org" "dataswamp.org" "floodgap.com") + do + (=> mail ping (:host host))) + +or another example + + (loop for service in '("smtpd" "nginx" "mysqld" "postgresql") + do + (=> mail service (:name service))) + +and another example using rows from a file to check remote hosts + + (with-open-file (stream "hosts.txt") + (loop for line = (read-line stream nil) + while line + do + (=> mail ping (:host line)))) + + +Conditional +----------- +It is also possible to achieve conditionals. There are two very useful +conditionals groups. + + +Dependency +~~~~~~~~~~ +Sometimes it may be a good idea to stop some probes if a probe +fail. In a case where you need to check a path through a network, from +the nearest machine to the remote target. If we can't reach our local +router, probes requiring the router to work will trigger errors so we +should skip them. + +(stop-if-error + (=> mail ping (:host "192.168.1.1" :desc "My local router")) + (=> mail ping (:host "89.89.89.89" :desc "My ISP DNS server")) + (=> mail ping (:host "kernel.org" :desc "Remote website"))) + +Note : stop-if-error is an alias for the **and** function. + + +Escalation +~~~~~~~~~~ +It could be a good idea to use different alerts +depending on how critical a check is, but sometimes, the critical +level may depend of the value of the error and/or the delay between +the detection and fixing it. You could want to receive a mail when +things need to be fixed on spare time, but mail another people if +things aren't fixed after some level. + +(escalation + (=> mail-me disk-usage (:path "/" :limit 70)) + (=> sms-me disk-usage (:path "/" :limit 90)) + (=> buzzer disk-usage (:path "/" :limit 98))) + +In this example, we check the disk usage, I will get a mail through +"mail-me" alert if the disk usage go get more than 70%. Once it goes +that far, it will check if the disk usage gets more than 90%, if so, +I'll receive a sms through "sms-me" alert. And then, if it goes more +than 98%, the "buzzer" alert will make some bad noises in the room to +warn me about this. + +Note : escalation is an alias for the **or** function.