rename README to correct extension and add :desc parameter - reed-alert - Lightweight agentless alerting system for server HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/ DIR Log DIR Files DIR Refs DIR Tags DIR README DIR LICENSE --- DIR commit 3273703cd51a5448cb21d807c19d6de2dd0c69cd DIR parent 739c71e0b0af1a99d3c63965266bc6ae37dd5682 HTML Author: solene rapenne <solene@dataswamp.org> Date: Fri, 7 Oct 2016 15:39:01 +0200 rename README to correct extension and add :desc parameter Diffstat: A README.md | 157 +++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+), 0 deletions(-) --- DIR diff --git a/README.md b/README.md @@ -0,0 +1,157 @@ +Presentation +============ + +reed-alert is a tool to check the status of various things on a server +and trigger user defined notifications to be alerted. In the code, +each check is called a "probe" and have parameters. + +The code is very rough for now. I will try to make the config file +easier than it is actually, but I think it's already easy enough for +people who need to kind of tool. + +reed-alert is regularly tested on FreeBSD/OpenBSD/Linux + + +Defining notification system +============================ + ++ function : the name of the probe ++ date : the current date with format YYYY/MM/DD hh:mm:ss ++ params : the parameters of the probe ++ hostname : the hostname of the server ++ result : the error returned (the value exceeding the limit, file not found) ++ description : an arbitrary description naming a check ++ level : the type of notification used ++ os : the type of operating system (FreeBSD/Linux/OpenBSD) ++ _ : a space character ++ space : a space character ++ newline : a newline character + +If you want to send a mail with a message like "At 2016/10/06 11:11:12 +server.foo.com has encountered a problem during LOAD-AVERAGE-15 +(:LIMIT 10) with a value of 30" you can write the following and use +**pretty-mail** in your checks. + + (defvar *alerts* + (list + '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function + params " with a value of " result "' | mail yourmail@foo.bar")))) + +If you don't want anything to be triggered, you can use the following +in *alerts* + + '(nothing-to-send nil) + +If you find it easier to read, you can add + in the concatenation, +this is simply discarded when the program parse the list. + + '(pretty-mail (date + " " + hostname + " has encountered a problem " + function)) + +The differents probes +===================== + +Probes are written in LISP and sometimes relies on system call, like +for ping or the average load of the system. It cares about running on +different operating system. + +The following parameter is allowed for every probes. It allows you to +describe what the check do / concern to put it in the notification if you want + :desc "STRING" + +number-of-processes +------------------- +Check if the actual number of processes of the system exceed the limit + +> Set the limit that will trigger an alert when exceeded + :limit INTEGER + +Example : `(=> example number-of-processes (:limit 200))` + +pid-running +----------- +Check if the PID number found in a .pid file is alive + +> Set the path of the pid file. If user don't have permission to open it, return "file not found" + :path "STRING" + +Example : `(=> example pid-running (:path "/var/run/nginx.pid"))` + + +disk-usage +---------- +Check if the used percent of the choosed partition exceed the limit + +> Set the mountpoint to check + :path "STRING" + +> Set the limit that will trigger an alert when exceeded + :limit INTEGER + +Example : `(=> example disk-usage (:path "/tmp" :limit 50))` + + +file-exists +----------- +Check if a file exists + +> Set the path of the file to check + :path "STRING" + +Example : `(=> example file-exists (:path "/var/postgresql/standby"))` + +file-updated +------------ +Check if a file exists and has been updated since a defined time + +> Set the path of the file to check + :path "STRING" + +> Set the limit in minutes since the last modification time before triggering an alert + :limit INTEGER + +Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))` + +load-average-1 +-------------- +Check if the load average on the last minute exceed the limit + +> Set the limit not to exceed + :limit INTEGER + +Example : `(=> example load-average-1 (:limit 2))` + +load-average-5 +-------------- +Check if the load average on the last fives minutes exceed the limit + +> Set the limit not to exceed + :limit INTEGER + +Example : `(=> example load-average-5 (:limit 2))` + +load-average-15 +--------------- +Check if the load average on the last fifteen minutes exceed the limit + +> Set the limit not to exceed + :limit INTEGER + +Example : `(=> example load-average-15 (:limit 2))` + +ping +---- +Check if a remote host answer the 2 ICMP ping + +> Set the host to ping. Return an error if ping command returns non-zero + :host "STRING" (can be IP or hostname) + +Example : `(=> example ping (:host "8.8.8.8"))` + +command +------- +Execute an arbitrary command which trigger an alert if the command return a non-zero value + +> Command to execute, accept commands with pipes + :command "STRING" + +Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`