Sync with new alert declaration and add explanations with code usage - reed-alert - Lightweight agentless alerting system for server
HTML git clone git://bitreich.org/reed-alert/ git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/reed-alert/
DIR Log
DIR Files
DIR Refs
DIR Tags
DIR README
DIR LICENSE
---
DIR commit 439acf53f4c8be2665c3459055a57b3d03656fd8
DIR parent 8e2203d405f186f1c5e6968d37e45482a7175399
HTML Author: Solene Rapenne <solene@perso.pw>
Date: Wed, 10 Jan 2018 20:17:32 +0100
Sync with new alert declaration and add explanations with code usage
Diffstat:
M README | 147 ++++++++++++++++++++++++-------
1 file changed, 115 insertions(+), 32 deletions(-)
---
DIR diff --git a/README b/README
@@ -20,11 +20,16 @@ tested with both **sbcl** and **ecl** - which should be available for
most distributions.
(On OpenBSD you may prefer to use ecl because sbcl needs 'wxallowed'
-where the binary is.)
+on the partition where the binary is.)
To make reed-alert's deployment easier I avoid using external
libraries. reed-alert only requires a Common LISP interpreter and a
-few files.
+its own files.
+
+A development to use quicklisp libraries to write more sophisticated
+checks like "does this url contains a pattern ?" had begun and had
+been abandoned, it has been decided to write shell command in the
+probe **command** if the user need more elaborated checks.
Code-Readability
@@ -34,7 +39,7 @@ Although the code is very rough for now, I think it's already fairly
understandable by people who do need this kind of tool.
I will try to improve on the readability of the config file in future
-commits.
+commits. NOTE : declaration of notifiers is easier now.
Usage
@@ -58,52 +63,53 @@ The configuration is explained below.
The Notification System
=======================
-+ function : the name of the probe
-+ date : the current date with format YYYY/MM/DD hh:mm:ss
-+ params : the parameters of the probe
-+ hostname : the hostname of the server
-+ result : the error returned (the value exceeding the limit, file not found)
-+ description : an arbitrary description naming a check
-+ level : the type of notification used
-+ os : the type of operating system (FreeBSD/Linux/OpenBSD)
-+ _ : a space character
-+ space : a space character
-+ newline : a newline character
+When a check return an error, a previously defined notifier will be
+called. The notifier is a shell command with a name. The shell command
+can contains variables from reed-alert.
+
++ %function% : the name of the probe
++ %date% : the current date with format YYYY/MM/DD hh:mm:ss
++ %params% : the parameters of the probe
++ %hostname% : the hostname of the server
++ %result% : the error returned (the value exceeding the limit, file not found)
++ %description% : an arbitrary description naming a check
++ %level% : the type of notification used
++ %os% : the type of operating system (FreeBSD/Linux/OpenBSD)
++ %newline% : a newline character
-Example Probe: 'Check For Load Average'
+Example Probe 1: 'Check For Load Average'
---------------------------------------
If you want to send a mail with a message like:
- "At 2016/10/06 11:11:12 server.foo.com has encountered a problem
+ "On 2016/10/06 11:11:12 server.foo.com has encountered a problem
during LOAD-AVERAGE-15 (:LIMIT 10) with a value of 30"
-write the following and use **pretty-mail** in your checks:
+write the following at the top of the file and use **pretty-mail** in your checks:
- (defvar *alerts*
- (list
- '(pretty-mail ("echo '" date _ hostname " has encountered a problem during" function
- params " with a value of " result "' | mail yourmail@foo.bar"))))
+ (alert pretty-mail "echo 'On %date% %hostname% has encountered a problem during %function%
+ %params% with a value of %result%' | mail yourmail@foo.bar")
-Variant 1
-~~~~~~~~~
-If you find it easier to read, you can add + in the concatenation.
-The + is discarded by reed-alert as soon as it parses the list.
+Example Probe 2: 'Don't do anything'
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If you don't want anything to be done when an error occur, use the following :
- '(pretty-mail (date + " " + hostname + " has encountered a problem " + function))
+ (alert nothing-to-send "")
-Variant 2
-~~~~~~~~~
-If you don't want anything to be triggered use the following in *alerts*:
+Example Probe 3: 'Send SMS'
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+You may want to use an external service to send a SMS, this is totally
+possible as we rely on a shell command :
- '(nothing-to-send nil)
+ (alert sms "echo 'error on %hostname : %function% %result%'
+ | curl -u login:pass http://api.sendsms.com/")
The Probes
==========
-Probes are written in Common LISP.
+Probes are written in Common LISP. They are predefined checks.
The :desc Parameter
-------------------
@@ -230,6 +236,7 @@ Example : `(=> alert ping (:host "8.8.8.8"))`
command
-------
Execute an arbitrary command which triggers an alert if it returns a non-zero value.
+This may be the most useful probe because it let the user do any check needed.
> Command to execute, accept commands with pipes.
:command "STRING"
@@ -255,4 +262,80 @@ Check if a file has a size less than a specified limit.
> Set the limit in bytes before triggering an alert.
:limit INTEGER
-Example : `(=> alert file-less-than (:path "/var/log/nginx/access.log" :limit 60))`
+Example : `(=> alert file-less-than (:path "/var/log/nginx.log" :limit 60))`
+
+
+The configuration file
+======================
+
+The configuration file is Common LISP code, so it's evaluated. It's
+possible to write some logic within it.
+
+
+Loops
+-----
+It's possible to write loops if you don't want to repeat code
+
+ (loop for host in '("bitreich.org" "dataswamp.org" "floodgap.com")
+ do
+ (=> mail ping (:host host)))
+
+or another example
+
+ (loop for service in '("smtpd" "nginx" "mysqld" "postgresql")
+ do
+ (=> mail service (:name service)))
+
+and another example using rows from a file to check remote hosts
+
+ (with-open-file (stream "hosts.txt")
+ (loop for line = (read-line stream nil)
+ while line
+ do
+ (=> mail ping (:host line))))
+
+
+Conditional
+-----------
+It is also possible to achieve conditionals. There are two very useful
+conditionals groups.
+
+
+Dependency
+~~~~~~~~~~
+Sometimes it may be a good idea to stop some probes if a probe
+fail. In a case where you need to check a path through a network, from
+the nearest machine to the remote target. If we can't reach our local
+router, probes requiring the router to work will trigger errors so we
+should skip them.
+
+(stop-if-error
+ (=> mail ping (:host "192.168.1.1" :desc "My local router"))
+ (=> mail ping (:host "89.89.89.89" :desc "My ISP DNS server"))
+ (=> mail ping (:host "kernel.org" :desc "Remote website")))
+
+Note : stop-if-error is an alias for the **and** function.
+
+
+Escalation
+~~~~~~~~~~
+It could be a good idea to use different alerts
+depending on how critical a check is, but sometimes, the critical
+level may depend of the value of the error and/or the delay between
+the detection and fixing it. You could want to receive a mail when
+things need to be fixed on spare time, but mail another people if
+things aren't fixed after some level.
+
+(escalation
+ (=> mail-me disk-usage (:path "/" :limit 70))
+ (=> sms-me disk-usage (:path "/" :limit 90))
+ (=> buzzer disk-usage (:path "/" :limit 98)))
+
+In this example, we check the disk usage, I will get a mail through
+"mail-me" alert if the disk usage go get more than 70%. Once it goes
+that far, it will check if the disk usage gets more than 90%, if so,
+I'll receive a sms through "sms-me" alert. And then, if it goes more
+than 98%, the "buzzer" alert will make some bad noises in the room to
+warn me about this.
+
+Note : escalation is an alias for the **or** function.