README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations. - reed-alert - Lightweight agentless alerting system for server HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ DIR Log DIR Files DIR Refs DIR Tags DIR README DIR LICENSE --- DIR commit 6e9a23321508e2e923f570efadc7b0e30d644c41 DIR parent fed9c9d46da253a5cd756d36ee7378b818b1559e HTML Author: Solene Rapenne <solene.rapenne@cbconseil.com> Date: Thu, 16 Nov 2017 10:13:43 +0100 README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations. Diffstat: M README | 221 +++++++++++++++++++++---------- 1 file changed, 152 insertions(+), 69 deletions(-) --- DIR diff --git a/README b/README @@ -1,37 +1,62 @@ -Presentation +Description +=========== + +reed-alert is a small and simple monitoring tool for your server, +written in Common LISP. + +reed-alert checks the status of various processes on a server and +triggers self defined notifications. + +Each triggered message is called an 'alert'. +Each check is called a 'probe'. +Each probe can be customized by different parameters. + + +Dependencies ============ -reed-alert is a tool to check the status of various things on a server -and trigger user defined notifications to be alerted. In the code, -each check is called a "probe" and have parameters. +reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been +tested with both **sbcl** and **ecl** - which should be available for +most distributions. -The code is very rough for now. I will try to make the config file -easier than it is actually, but I think it's already easy enough for -people who need to kind of tool. +(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed' +where the binary is.) -I try to avoid usage of external libraries so the deployment is easy -as it only requires a Common LISP interpreter and a few files. +To make reed-alert's deployment easier I avoid using external +libraries. reed-alert only requires a Common LISP interpreter and a +few files. -reed-alert is regularly tested on FreeBSD/OpenBSD/Linux. -How to use -========== +Code-Readability +================ -It has been tested with both **sbcl** and **ecl** which should be -available in most distribution people use. On OpenBSD you may prefer -to use ecl because sbcl needs wxallowed where the binary is. +Although the code is very rough for now, I think it's already fairly +understandable by people who do need this kind of tool. +I will try to improve on the readability of the config file in future +commits. + + +Usage +===== + +Start reed-alert +---------------- To start reed-alert + sbcl : **sbcl --script config_file.lisp** + ecl : **ecl -shell config_file.lisp** -You can rename **config.lisp.sample** to **config.lisp** to create -your own configuration file. The configuration is explained below. +Personal Configuration File +--------------------------- +You may want to rename **config.lisp.sample** to **config.lisp** in +order to create your own configuration file. +The configuration is explained below. -Defining notification system -============================ + +The Notification System +======================= + function : the name of the probe + date : the current date with format YYYY/MM/DD hh:mm:ss @@ -45,131 +70,189 @@ Defining notification system + space : a space character + newline : a newline character -If you want to send a mail with a message like "At 2016/10/06 11:11:12 -server.foo.com has encountered a problem during LOAD-AVERAGE-15 -(:LIMIT 10) with a value of 30" you can write the following and use -**pretty-mail** in your checks. + +Example Probe: 'Check For Load Average' +--------------------------------------- +If you want to send a mail with a message like: + + "At 2016/10/06 11:11:12 server.foo.com has encountered a problem + during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30" + + +write the following and use **pretty-mail** in your checks: (defvar *alerts* (list '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function params " with a value of " result "' | mail yourmail@foo.bar")))) - -If you don't want anything to be triggered, you can use the following -in *alerts* - '(nothing-to-send nil) - -If you find it easier to read, you can add + in the concatenation, -this is simply discarded when the program parse the list. +Variant 1 +~~~~~~~~~ +If you find it easier to read, you can add + in the concatenation. +The + is discarded by reed-alert as soon as it parses the list. '(pretty-mail (date + " " + hostname + " has encountered a problem " + function)) -The differents probes -===================== +Variant 2 +~~~~~~~~~ +If you don't want anything to be triggered use the following in *alerts*: + + '(nothing-to-send nil) -Probes are written in LISP and sometimes relies on system call, like -for ping or the average load of the system. It cares about running on -different operating system. -The following parameter is allowed for every probes. It allows you to -describe what the check do / concern to put it in the notification if you want +The Probes +========== + +Probes are written in Common LISP. + +The :desc Parameter +------------------- +The :desc parameter allows you to describe specifically what your check +does. It can be put in every probe. + :desc "STRING" + +Overview +-------- +As of this commit, reed-alert ships with the following probes: + + (1) number-of-processes + (2) pid-running + (3) disk-usage + (4) file-exists + (5) file-updated + (6) load-average-1 + (7) load-average-5 + (8) load-average-15 + (9) ping + (10) command + (11) service + (12) file-less-than + + number-of-processes ------------------- -Check if the actual number of processes of the system exceed the limit +Check if the actual number of processes of the system exceeds a specific limit. -> Set the limit that will trigger an alert when exceeded +> Set the limit that will trigger an alert when exceeded. :limit INTEGER -Example : `(=> example number-of-processes (:limit 200))` +Example : `(=> alert number-of-processes (:limit 200))` + pid-running ----------- -Check if the PID number found in a .pid file is alive +Check if the PID number found in a .pid file is alive. -> Set the path of the pid file. If user don't have permission to open it, return "file not found" +> Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found". :path "STRING" -Example : `(=> example pid-running (:path "/var/run/nginx.pid"))` +Example : `(=> alert pid-running (:path "/var/run/nginx.pid"))` disk-usage ---------- -Check if the used percent of the choosed partition exceed the limit +Check if the disk-usage of a chosen partition does exceed a specific limit. -> Set the mountpoint to check +> Set the mountpoint to check. :path "STRING" -> Set the limit that will trigger an alert when exceeded +> Set the limit that will trigger an alert when exceeded. :limit INTEGER -Example : `(=> example disk-usage (:path "/tmp" :limit 50))` +Example : `(=> alert disk-usage (:path "/tmp" :limit 50))` file-exists ----------- -Check if a file exists +Check if a file exists. -> Set the path of the file to check +> Set the path of the file to check. :path "STRING" -Example : `(=> example file-exists (:path "/var/postgresql/standby"))` +Example : `(=> alert file-exists (:path "/var/postgresql/standby"))` + file-updated ------------ -Check if a file exists and has been updated since a defined time +Check if a file exists and has been updated since a defined time. -> Set the path of the file to check +> Set the path of the file to check. :path "STRING" -> Set the limit in minutes since the last modification time before triggering an alert +> Set the limit in minutes since the last modification time before triggering an alert. :limit INTEGER -Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))` +Example : `(=> alert file-updated (:path "/var/log/nginx/access.log" :limit 60))` + load-average-1 -------------- -Check if the load average on the last minute exceed the limit +Check if the load average during the last minute exceeds a specific limit. -> Set the limit not to exceed +> Set the limit not to exceed. :limit INTEGER -Example : `(=> example load-average-1 (:limit 2))` +Example : `(=> alert load-average-1 (:limit 2))` + load-average-5 -------------- -Check if the load average on the last fives minutes exceed the limit +Check if the load average during the last five minutes exceeds a specific limit. -> Set the limit not to exceed +> Set the limit not to exceed. :limit INTEGER -Example : `(=> example load-average-5 (:limit 2))` +Example : `(=> alert load-average-5 (:limit 2))` + load-average-15 --------------- -Check if the load average on the last fifteen minutes exceed the limit +Check if the load average during the last fifteen minutes exceeds a specific limit. -> Set the limit not to exceed +> Set the limit not to exceed. :limit INTEGER -Example : `(=> example load-average-15 (:limit 2))` +Example : `(=> alert load-average-15 (:limit 2))` + ping ---- -Check if a remote host answer the 2 ICMP ping +Check if a remote host answers the 2 ICMP ping. -> Set the host to ping. Return an error if ping command returns non-zero +> Set the host to ping. Return an error if ping command returns non-zero. :host "STRING" (can be IP or hostname) -Example : `(=> example ping (:host "8.8.8.8"))` +Example : `(=> alert ping (:host "8.8.8.8"))` + command ------- -Execute an arbitrary command which trigger an alert if the command return a non-zero value +Execute an arbitrary command which triggers an alert if it returns a non-zero value. -> Command to execute, accept commands with pipes +> Command to execute, accept commands with pipes. :command "STRING" -Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))` +Example : `(=> alert command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))` + +service +------- +Check if a service is started on the system. + +> Set the name of the service to test + :name STRING + +Example : `(=> alert service (:name "mysql-server"))` + +file-less-than +-------------- +Check if a file has a size less than a specified limit. + +> Set the path of the file to check. + :path "STRING" + +> Set the limit in bytes before triggering an alert. + :limit INTEGER + +Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`