README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations. - reed-alert - Lightweight agentless alerting system for server
HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/
DIR Log
DIR Files
DIR Refs
DIR Tags
DIR README
DIR LICENSE
---
DIR commit 6e9a23321508e2e923f570efadc7b0e30d644c41
DIR parent fed9c9d46da253a5cd756d36ee7378b818b1559e
HTML Author: Solene Rapenne <solene.rapenne@cbconseil.com>
Date: Thu, 16 Nov 2017 10:13:43 +0100
README has been reworked, thanks to lambda from fnord.one. He fixed typos and enhanced explanations.
Diffstat:
M README | 221 +++++++++++++++++++++----------
1 file changed, 152 insertions(+), 69 deletions(-)
---
DIR diff --git a/README b/README
@@ -1,37 +1,62 @@
-Presentation
+Description
+===========
+
+reed-alert is a small and simple monitoring tool for your server,
+written in Common LISP.
+
+reed-alert checks the status of various processes on a server and
+triggers self defined notifications.
+
+Each triggered message is called an 'alert'.
+Each check is called a 'probe'.
+Each probe can be customized by different parameters.
+
+
+Dependencies
============
-reed-alert is a tool to check the status of various things on a server
-and trigger user defined notifications to be alerted. In the code,
-each check is called a "probe" and have parameters.
+reed-alert is regularly tested on FreeBSD/OpenBSD/Linux and has been
+tested with both **sbcl** and **ecl** - which should be available for
+most distributions.
-The code is very rough for now. I will try to make the config file
-easier than it is actually, but I think it's already easy enough for
-people who need to kind of tool.
+(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
+where the binary is.)
-I try to avoid usage of external libraries so the deployment is easy
-as it only requires a Common LISP interpreter and a few files.
+To make reed-alert's deployment easier I avoid using external
+libraries. reed-alert only requires a Common LISP interpreter and a
+few files.
-reed-alert is regularly tested on FreeBSD/OpenBSD/Linux.
-How to use
-==========
+Code-Readability
+================
-It has been tested with both **sbcl** and **ecl** which should be
-available in most distribution people use. On OpenBSD you may prefer
-to use ecl because sbcl needs wxallowed where the binary is.
+Although the code is very rough for now, I think it's already fairly
+understandable by people who do need this kind of tool.
+I will try to improve on the readability of the config file in future
+commits.
+
+
+Usage
+=====
+
+Start reed-alert
+----------------
To start reed-alert
+ sbcl : **sbcl --script config_file.lisp**
+ ecl : **ecl -shell config_file.lisp**
-You can rename **config.lisp.sample** to **config.lisp** to create
-your own configuration file. The configuration is explained below.
+Personal Configuration File
+---------------------------
+You may want to rename **config.lisp.sample** to **config.lisp** in
+order to create your own configuration file.
+The configuration is explained below.
-Defining notification system
-============================
+
+The Notification System
+=======================
+ function : the name of the probe
+ date : the current date with format YYYY/MM/DD hh:mm:ss
@@ -45,131 +70,189 @@ Defining notification system
+ space : a space character
+ newline : a newline character
-If you want to send a mail with a message like "At 2016/10/06 11:11:12
-server.foo.com has encountered a problem during LOAD-AVERAGE-15
-(:LIMIT 10) with a value of 30" you can write the following and use
-**pretty-mail** in your checks.
+
+Example Probe: 'Check For Load Average'
+---------------------------------------
+If you want to send a mail with a message like:
+
+ "At 2016/10/06 11:11:12 server.foo.com has encountered a problem
+ during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
+
+
+write the following and use **pretty-mail** in your checks:
(defvar *alerts*
(list
'(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
params " with a value of " result "' | mail yourmail@foo.bar"))))
-
-If you don't want anything to be triggered, you can use the following
-in *alerts*
- '(nothing-to-send nil)
-
-If you find it easier to read, you can add + in the concatenation,
-this is simply discarded when the program parse the list.
+Variant 1
+~~~~~~~~~
+If you find it easier to read, you can add + in the concatenation.
+The + is discarded by reed-alert as soon as it parses the list.
'(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
-The differents probes
-=====================
+Variant 2
+~~~~~~~~~
+If you don't want anything to be triggered use the following in *alerts*:
+
+ '(nothing-to-send nil)
-Probes are written in LISP and sometimes relies on system call, like
-for ping or the average load of the system. It cares about running on
-different operating system.
-The following parameter is allowed for every probes. It allows you to
-describe what the check do / concern to put it in the notification if you want
+The Probes
+==========
+
+Probes are written in Common LISP.
+
+The :desc Parameter
+-------------------
+The :desc parameter allows you to describe specifically what your check
+does. It can be put in every probe.
+
:desc "STRING"
+
+Overview
+--------
+As of this commit, reed-alert ships with the following probes:
+
+ (1) number-of-processes
+ (2) pid-running
+ (3) disk-usage
+ (4) file-exists
+ (5) file-updated
+ (6) load-average-1
+ (7) load-average-5
+ (8) load-average-15
+ (9) ping
+ (10) command
+ (11) service
+ (12) file-less-than
+
+
number-of-processes
-------------------
-Check if the actual number of processes of the system exceed the limit
+Check if the actual number of processes of the system exceeds a specific limit.
-> Set the limit that will trigger an alert when exceeded
+> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
-Example : `(=> example number-of-processes (:limit 200))`
+Example : `(=> alert number-of-processes (:limit 200))`
+
pid-running
-----------
-Check if the PID number found in a .pid file is alive
+Check if the PID number found in a .pid file is alive.
-> Set the path of the pid file. If user don't have permission to open it, return "file not found"
+> Set the path of the pid file. If $USER doesn't have permission to open it, return "file not found".
:path "STRING"
-Example : `(=> example pid-running (:path "/var/run/nginx.pid"))`
+Example : `(=> alert pid-running (:path "/var/run/nginx.pid"))`
disk-usage
----------
-Check if the used percent of the choosed partition exceed the limit
+Check if the disk-usage of a chosen partition does exceed a specific limit.
-> Set the mountpoint to check
+> Set the mountpoint to check.
:path "STRING"
-> Set the limit that will trigger an alert when exceeded
+> Set the limit that will trigger an alert when exceeded.
:limit INTEGER
-Example : `(=> example disk-usage (:path "/tmp" :limit 50))`
+Example : `(=> alert disk-usage (:path "/tmp" :limit 50))`
file-exists
-----------
-Check if a file exists
+Check if a file exists.
-> Set the path of the file to check
+> Set the path of the file to check.
:path "STRING"
-Example : `(=> example file-exists (:path "/var/postgresql/standby"))`
+Example : `(=> alert file-exists (:path "/var/postgresql/standby"))`
+
file-updated
------------
-Check if a file exists and has been updated since a defined time
+Check if a file exists and has been updated since a defined time.
-> Set the path of the file to check
+> Set the path of the file to check.
:path "STRING"
-> Set the limit in minutes since the last modification time before triggering an alert
+> Set the limit in minutes since the last modification time before triggering an alert.
:limit INTEGER
-Example : `(=> example file-updated (:path "/var/log/nginx/access.log" :limit 60))`
+Example : `(=> alert file-updated (:path "/var/log/nginx/access.log" :limit 60))`
+
load-average-1
--------------
-Check if the load average on the last minute exceed the limit
+Check if the load average during the last minute exceeds a specific limit.
-> Set the limit not to exceed
+> Set the limit not to exceed.
:limit INTEGER
-Example : `(=> example load-average-1 (:limit 2))`
+Example : `(=> alert load-average-1 (:limit 2))`
+
load-average-5
--------------
-Check if the load average on the last fives minutes exceed the limit
+Check if the load average during the last five minutes exceeds a specific limit.
-> Set the limit not to exceed
+> Set the limit not to exceed.
:limit INTEGER
-Example : `(=> example load-average-5 (:limit 2))`
+Example : `(=> alert load-average-5 (:limit 2))`
+
load-average-15
---------------
-Check if the load average on the last fifteen minutes exceed the limit
+Check if the load average during the last fifteen minutes exceeds a specific limit.
-> Set the limit not to exceed
+> Set the limit not to exceed.
:limit INTEGER
-Example : `(=> example load-average-15 (:limit 2))`
+Example : `(=> alert load-average-15 (:limit 2))`
+
ping
----
-Check if a remote host answer the 2 ICMP ping
+Check if a remote host answers the 2 ICMP ping.
-> Set the host to ping. Return an error if ping command returns non-zero
+> Set the host to ping. Return an error if ping command returns non-zero.
:host "STRING" (can be IP or hostname)
-Example : `(=> example ping (:host "8.8.8.8"))`
+Example : `(=> alert ping (:host "8.8.8.8"))`
+
command
-------
-Execute an arbitrary command which trigger an alert if the command return a non-zero value
+Execute an arbitrary command which triggers an alert if it returns a non-zero value.
-> Command to execute, accept commands with pipes
+> Command to execute, accept commands with pipes.
:command "STRING"
-Example : `(=> example command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
+Example : `(=> alert command (:command "tail -n 10 /var/log/messages | grep -v CRITICAL"))`
+
+service
+-------
+Check if a service is started on the system.
+
+> Set the name of the service to test
+ :name STRING
+
+Example : `(=> alert service (:name "mysql-server"))`
+
+file-less-than
+--------------
+Check if a file has a size less than a specified limit.
+
+> Set the path of the file to check.
+ :path "STRING"
+
+> Set the limit in bytes before triggering an alert.
+ :limit INTEGER
+
+Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`