rename README to correct extension and add :desc parameter - reed-alert - Lightweight agentless alerting system for server
HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/
DIR Log
DIR Files
DIR Refs
DIR Tags
DIR README
DIR LICENSE
---
DIR commit 3273703cd51a5448cb21d807c19d6de2dd0c69cd
DIR parent 739c71e0b0af1a99d3c63965266bc6ae37dd5682
HTML Author: solene rapenne <solene@dataswamp.org>
Date: Fri, 7 Oct 2016 15:39:01 +0200
rename README to correct extension and add :desc parameter
Diffstat:
A README.md | 157 +++++++++++++++++++++++++++++++
1 file changed, 157 insertions(+), 0 deletions(-)
---
DIR diff --git a/README.md b/README.md
@@ -0,0 +1,157 @@
+Presentation
+============
+
+reed-alert is a tool to check the status of various things on a server
+and trigger user defined notifications to be alerted. In the code,
+each check is called a "probe" and have parameters.
+
+The code is very rough for now. I will try to make the config file
+easier than it is actually, but I think it's already easy enough for
+people who need to kind of tool.
+
+reed-alert is regularly tested on FreeBSD/OpenBSD/Linux
+
+
+Defining notification system
+============================
+
++ function : the name of the probe
++ date : the current date with format YYYY/MM/DD hh:mm:ss
++ params : the parameters of the probe
++ hostname : the hostname of the server
++ result : the error returned (the value exceeding the limit, file not found)
++ description : an arbitrary description naming a check
++ level : the type of notification used
++ os : the type of operating system (FreeBSD/Linux/OpenBSD)
++ _ : a space character
++ space : a space character
++ newline : a newline character
+
+If you want to send a mail with a message like "At 2016/10/06 11:11:12
+server.foo.com has encountered a problem during LOAD-AVERAGE-15
+(:LIMIT 10) with a value of 30" you can write the following and use
+**pretty-mail** in your checks.
+
+ (defvar *alerts*
+ (list
+ '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
+ params " with a value of " result "' | mail yourmail@foo.bar"))))
+
+If you don't want anything to be triggered, you can use the following
+in *alerts*
+
+ '(nothing-to-send nil)
+
+If you find it easier to read, you can add + in the concatenation,
+this is simply discarded when the program parse the list.
+
+ '(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
+
+The differents probes
+=====================
+
+Probes are written in LISP and sometimes relies on system call, like
+for ping or the average load of the system. It cares about running on
+different operating system.
+
+The following parameter is allowed for every probes. It allows you to
+describe what the check do / concern to put it in the notification if you want
+ :desc "STRING"
+
+number-of-processes
+-------------------
+Check if the actual number of processes of the system exceed the limit
+
+> Set the limit that will trigger an alert when exceeded
+ :limit INTEGER
+
+Example : `(=> example number-of-processes (:limit 200))`
+
+pid-running
+-----------
+Check if the PID number found in a .pid file is alive
+
+> Set the path of the pid file. If user don't have permission to open it, return "file not found"
+ :path "STRING"
+
+Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
+
+
+disk-usage
+----------
+Check if the used percent of the choosed partition exceed the limit
+
+> Set the mountpoint to check
+ :path "STRING"
+
+> Set the limit that will trigger an alert when exceeded
+ :limit INTEGER
+
+Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
+
+
+file-exists
+-----------
+Check if a file exists
+
+> Set the path of the file to check
+ :path "STRING"
+
+Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
+
+file-updated
+------------
+Check if a file exists and has been updated since a defined time
+
+> Set the path of the file to check
+ :path "STRING"
+
+> Set the limit in minutes since the last modification time before triggering an alert
+ :limit INTEGER
+
+Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
+
+load-average-1
+--------------
+Check if the load average on the last minute exceed the limit
+
+> Set the limit not to exceed
+ :limit INTEGER
+
+Example : `(=> example load-average-1 (:limit 2))`
+
+load-average-5
+--------------
+Check if the load average on the last fives minutes exceed the limit
+
+> Set the limit not to exceed
+ :limit INTEGER
+
+Example : `(=> example load-average-5 (:limit 2))`
+
+load-average-15
+---------------
+Check if the load average on the last fifteen minutes exceed the limit
+
+> Set the limit not to exceed
+ :limit INTEGER
+
+Example : `(=> example load-average-15 (:limit 2))`
+
+ping
+----
+Check if a remote host answer the 2 ICMP ping
+
+> Set the host to ping. Return an error if ping command returns non-zero
+ :host "STRING" (can be IP or hostname)
+
+Example : `(=> example ping (:host "8.8.8.8"))`
+
+command
+-------
+Execute an arbitrary command which trigger an alert if the command return a non-zero value
+
+> Command to execute, accept commands with pipes
+ :command "STRING"
+
+Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`