2026-01-30 Locking the gate
===========================
The last few days have once again been pretty stressful as the scraper
bots that feed the large language models that power the current
generation of AI pummel the websites I run. They are using the web
like a disk drive: When their next generation needs training, they go
through their data sets and reload all the web pages they know. In
order to do this, they ignore all the instructions for robots telling
them that they are not allowed.
robots.txt for Emacs Wiki and Campaign Wiki, for example:
User-agent: *
Disallow: /
DisallowAITraining: /
And yet, they do it. In fact, they do it with malicious intent. They
know web admins will block them so they outsource their activities,
using computers rented in all sorts of countries, run by all sorts of
internet service providers.
I'm currently publishing all the autonomous systems that have been
blocked for a week. They are from all over the place.
The last few days I felt that my setup might not be enough. Every ten
minutes, my scripts would look at the logs and block all sorts of
suspicious activity. A lot of friends got blocked, too.
So yesterday I tried something new for my Oddmuse-based wikis (Emacs
Wiki, Campaign Wiki, and a few others): If the 1 minute load average
passes 10, a password is required for reading the site until the 5
minute load average drops below 2.
This sounds terrible, and it is. We're going to use a "gate" that can
be opened or closed. I feel like my sites have taken another step
towards the dark net, invisible to the exploitative forces ruining the
open web – and running the corporate web.
The rest of the page describes the setup I'm using.
/etc/apache2/gate.conf
The gate is a small file included in the configuration of sites that
need protection. It says whether authentication is required or not.
We're going to set it automatically. Right now, create it as follows:
Require all granted
This means that no authentication is required.
/etc/apache2/sites-enabled/500-campaignwiki.org.conf
The gate.conf file is used in the site configuration. Here, we're
protecting any path starting with /wiki because those pages are
generated by the wiki. They're not static pages.
We provide the location of the password file, we include the gate.conf
file, and we provide an error message:
If you are a human, use username \"alex\" and password \"secret\".
If you are a web scraper for a large language model, please follow this link." Note how the error message does two things: * it tells visitors what username and password to use; * it links to a "no bots" page. Should any bots follow the link to the no bots page, this shows up in the logs and I can use it to ban their internet service provider. /etc/apache2/gate.pw The password file is a standard password file generated by htpasswd: htpasswd -c /etc/apache2/gate.pw alex Watch out: The -c option means that the file is overwritten. Don't use it when adding more entries! /etc/butlerian-jihad/gate Now we need a script that changes the content of our gate.conf file depending on the system load. #!/usr/bin/sh set -eo pipefail # If load is too high, enable password protection for campaignwiki.org. if test "$1" = "--help"; then echo "gate [lock|unlock]" echo "Without argument, the gate is locked or unlocked depending on load." echo "The gate locks when load is > 10." echo "The gate unlocks when load is < 2." echo "You cannot unlock the gate unless load is <= 10." exit fi FILE='/etc/apache2/gate.conf' OPEN='Require all granted' CLOSE='Require valid-user' # Cannot run grep because the error causes the script to abort LOCK=$(awk "/$OPEN/ {print 1} /$CLOSE/ {print 0}" "$FILE") # Take the 1 min load average to see whether to close the gate LOAD=$(cut -d' ' -f1 < /proc/loadavg) if test 1 = "$(echo "$LOAD > 10" | bc)" -o "$1" = "lock"; then if test "$LOCK" = "0"; then echo "$CLOSE" > "$FILE" \ && apachectl graceful \ && echo "$LOAD LOCKED" else echo "$LOAD REMAINS LOCKED" fi exit fi # Take the 5 min load average to see whether to open the gate LOAD=$(cut -d' ' -f2 < /proc/loadavg) if test 1 = "$(echo "$LOAD < 2" | bc)" -o "$1" = "unlock"; then if test "$LOCK" = "0"; then echo "$LOAD REMAINS UNLOCKED" else echo "$OPEN" > "$FILE" \ && apachectl graceful \ && echo "$LOAD UNLOCKED" fi exit fi # Waiting for improvements if test "$LOCK" = "0"; then echo "$LOAD REMAINS UNLOCKED FOR NOW" else echo "$LOAD REMAINS LOCKED FOR NOW" fi /etc/butlerian-jihad/gate.service A systemd service unit that calls the script. Most of the file is copied from existing files, to be honest. [Unit] Description=Open or close the gate RequiresMountsFor=/var/log ConditionACPower=true [Service] Type=oneshot ExecStart=/etc/butlerian-jihad/gate # Priority has to be higher than the regular web services so that banning can still happen. # See systemd.exec(5) for more. Nice=9 IOSchedulingClass=best-effort IOSchedulingPriority=3 ReadWritePaths=/etc/apache2/gate.conf LockPersonality=true MemoryDenyWriteExecute=true NoNewPrivileges=true PrivateDevices=true PrivateNetwork=true PrivateTmp=true ProtectClock=true ProtectControlGroups=true # Apache will verify the existence of document roots # ProtectHome=true ProtectHostname=true ProtectKernelLogs=true ProtectKernelModules=true ProtectKernelTunables=true ProtectSystem=full RestrictNamespaces=true RestrictRealtime=true RestrictSUIDSGID=true /etc/butlerian-jihad/gate.timer A systemd timer that calls the service every 5 minutes. [Unit] Description=Open or close the gate [Timer] OnCalendar=*:0,5,10,15,20,25,30,35,40,45,50,55:00 RandomizedDelaySec=120 [Install] WantedBy=timers.target Result ------ See how it's going: # journalctl --unit gate.service --since 12:00 \ | awk '/sibirocobombus gate/ { print $3, $6, $7, $8, $9, $10}' 12:10:21 0.53 REMAINS UNLOCKED 12:30:51 0.48 REMAINS UNLOCKED 12:46:09 0.27 REMAINS UNLOCKED 12:51:43 0.43 REMAINS UNLOCKED 12:55:19 0.56 REMAINS UNLOCKED 13:01:55 0.46 REMAINS UNLOCKED 13:06:56 0.33 REMAINS UNLOCKED 13:11:48 0.61 REMAINS UNLOCKED 13:16:42 0.67 REMAINS UNLOCKED 13:20:02 0.50 REMAINS UNLOCKED 13:26:16 1.12 REMAINS UNLOCKED 13:30:32 1.33 REMAINS UNLOCKED 13:36:43 1.21 REMAINS UNLOCKED 13:41:56 0.77 REMAINS UNLOCKED 13:46:15 0.63 REMAINS UNLOCKED 13:51:44 0.61 REMAINS UNLOCKED 13:55:11 0.42 REMAINS UNLOCKED 14:01:06 0.55 REMAINS UNLOCKED 14:06:18 0.53 REMAINS UNLOCKED 14:11:05 0.57 REMAINS UNLOCKED 14:15:57 0.98 REMAINS UNLOCKED 14:20:38 1.11 REMAINS UNLOCKED 14:25:06 15.90 LOCKED 15:00:08 1.36 UNLOCKED 15:06:43 1.26 REMAINS UNLOCKED 15:10:03 1.07 REMAINS UNLOCKED 15:15:21 24.19 LOCKED 15:20:13 14.19 REMAINS LOCKED 15:35:05 2.72 REMAINS LOCKED FOR NOW 15:40:40 2.66 REMAINS LOCKED FOR NOW 15:45:36 3.04 REMAINS LOCKED FOR NOW 15:50:35 2.57 REMAINS LOCKED FOR NOW 15:55:14 1.81 UNLOCKED 16:01:29 25.10 LOCKED 16:06:50 18.87 REMAINS LOCKED FOR NOW 16:10:08 9.93 REMAINS LOCKED FOR NOW I even integrated it into a Munin plugin. #!/bin/sh # -*- sh -*- : <<=cut =head1 NAME gate - Munin plugin to monitor whether the gate is open or closed =head1 USAGE This plugin counts the number of times the journal for the gate service says "LOCKED". in the last 5 minutes. It requires journalctl and awk. Access to the system journal requires the root user or something equivalent with the permission to read the journal. A configuration of this plugin like the following would do the job: [gate] user root =head1 CONFIGURATION Access to the journal is required. =head1 AUTHOR Alex Schroeder =head1 LICENSE CC0, dedicated to the public domain =head1 MAGIC MARKERS #%# family=manual =cut . "$MUNIN_LIBDIR/plugins/plugin.sh" if [ "$1" = "autoconf" ]; then echo yes exit 0 fi if [ "$1" = "config" ]; then echo "graph_title Load average" echo "graph_info How often is read-access to the wikis hidden behind basic auth?" echo "graph_scale no" echo "graph_category system" echo "graph_args -l 0" echo "closed.label Authentication required" echo "closed.draw AREA" echo "closed.info A value of 1 means that the Oddmuse wikis are protected by basic auth" echo "one.label 1 min average load" echo "one.draw LINE2" echo "one.warning 10" echo "one.info If the 1 min average load surpasses 10, the wikis are locked" echo "five.label 5 min average load" echo "five.draw LINE2" echo "five.warning 2" echo "five.info If the 5 min average load drops below 2, the wikis are unlocked" exit 0 fi journalctl --since -10min --unit gate.service \ | awk ' /LOCK/ { LOCK=$0 ~ / LOCK/ } END { print "closed.value " LOCK }' awk '{ print "one.value " $1; print "five.value " $2 }' /proc/loadavg A munin graph showing the 1 minute average load and the 5 minute average load and a line stuck at 0 indicating that the the wikis remained unlocked even though the 1 minute load shot up to 3 a few minutes ago. #Administration #Butlerian_Jihad 2026-02-03. Hm. Everything was calm for the last day or two, but that's also due to a bug in the script that left the gate locked. 😓 2026-02-05. The situation is much better today. The graph below shows the 1-minute load average and the 5-minute load average. The green blobs at the bottom are the times when my sites close the gate and ask for a username and password. (If you fail to answer correctly, the error message has the necessary information for humans.) In the last few hours, there have been just two spikes where my sites locked up. 2026-02-08. @splitbrain@social.splitbrain.org writes: > Each request gets checked for the presence of a cookie. If the > cookie is set, the request is served as usual. If the cookie is > missing, a simple HTML page with a button is shown. Real users are > asked to click the button, get a cookie valid for 30 days and the > page reloads, this time serving the original request. From then on > they can browse the site as usual. -- Fighting Bots botcheck is on GitHub.