MY SESSION WITH THE BOTS On Monday I got emails from failed cron jobs on the VPS that runs my website, caused by failed connections to other websites. I tried to SSH in, but it couldn't connect, nor could a web browser, oh dear. Onto the VPS control panel website to piss away my home internet data quota using their web-based VNC, where a hopelessly laggy stream of errors like this was pouring out over the vitrual fbcon: nf_conntrack: nf_conntrack: table full, dropping packet A quick web search reveled that this meant there were too many connections for "nf_conntrack" to handle, solved by the dodgy-sounding solution of setting /proc/sys/net/netfilter/nf_conntrack_max to some random huge number. So I typed that in blind to the laggy stream of errors in the VNC terminal and eventually saw my commands scroll past, then SSH finally worked. Still no luck in a web browser though, it turns out Apache was at its 150 process limit serving endless simultaneous requests for the same sub-section of my website by hundreds of bots with random User-Agents and IP addresses (Brazil and Thailand seemed to be favourites for the latter). Upping the processes limit with "ServerLimit=450" and "MaxRequestWorkers=450" in /etc/apache2/mods-enabled/mpm_prefork.conf worked for a little while, but the bot connections edged up to over 400 Apache processes (probably queuing up as it got slower to respond) and the RAM ran out. I wasn't sure if that was the dodgy nf_conntrack_max setting, since I gather huge values have RAM implications, but although I found some better docs, and spent a silly amount of time trying to make sense of them, I couldn't. It's one of those annoying things in the Linux kernel that look like they're documented, but it's really all too vague to be useful: https://www.kernel.org/doc/html/latest/networking/nf_conntrack-sysctl.html This page goes into much more detail, but somehow still loses me, and it's clearly outdated compared to the way things are described in the official docs: https://wiki.khnet.info/index.php/Conntrack_tuning But it does mention a maximum default value of 8192, which was what /proc/sys/net/netfilter/nf_conntrack_max was set to before. Although the offical docs say for nf_conntrack_max: "This value is set to nf_conntrack_buckets by default", and for nf_contrack_buckets: "If not specified as parameter during module loading, the default size is calculated by dividing total memory by 16384". "free -b" shows 1007349760 bytes total physical RAM, so 1007349760 / 16384 = 61483. So I set both to that in "/etc/sysctl.conf", which is apparantly the tidy place to put these settings in Devuan rather than "echo"ing to /proc at start-up: net.netfilter.nf_conntrack_buckets=61483 net.netfilter.nf_conntrack_max=61483 Still not enough RAM though, Apache was eating it all. But only one sub-section of my website was being hit, generated by a PHP script, so I gave up and took it down by replacing it with a very short HTML file, and Apache processes dropped down to around 300. That gave me time to address the other problem of the Apache access logs, which were going to be GBs per day in size. Logrotate has an option to rotate log files early if they exceed a certain size. Setting "maxsize 100M" in /etc/logrotate.d/apache2 and moving the logrotate cron job from /etc/cron.daily/ to /etc/cron.hourly/, made it compress and rotate Apache logs early if they grow above 100MB each. It was already set to delete the 15th copy, so now instead of two weeks of logs I got about two or three days, but oh well. To think I used to keep web access logs permanently! Looking at the log files closely, they all accessed the page with a PHPSESSID URL parameter, but that part of the site doesn't use session tracking, so I turned using PHPSESSID off with "php_flag session.use_trans_sid off" in .htaccess and enabled the PHP script again. But no good! In a web browser with cookies disabled I no longer got links with PHPSESSID in them, but the bots kept on requesting URLs with PHPSESSID set to random values like nothing had changed! It seemed they weren't crawling the site then, looking to feed an AI with content, but trying new session strings themselves. Why? A brute force attempt to hijack other user's sessions/accounts (non-existant there anyway)? But why not do that with cookies, which are more widely used? Or a deliberate DDoS attack on my website? But why just one sub-section even though it links out to lots of other parts of my website including other PHP scripts? In the end I gave up asking questions and was just thankful for their stupidity because now that the PHP script shouldn't be making links with PHPSESSID in them, I can block requests with PHPSESSID in their query string. So I put this in .htaccess: Require all denied Sure enough, it blocked them all, and they never picked up the PHPSESSID-less URLs. Still huge numbers of requests, but with the short 403 response the server delt with them quicker so only around 100-150 simultaneous server processes required, and each using about half the RAM presumably because they didn't have to load mod_php anymore. Still it continued for days, before eventually stopping. Just to confuse my attempts to understand their motivation, the logs now show "Amazonbot" (from Amazon IPs, so probably legit) still trying the old URLs with PHPSESSID today, but at a comparatively sedate maximum of three denied requests per second compared to the 60-75 denied requests per second I saw before. At least I do now know the safe Apache MaxRequestWorkers setting is about 350 (note that ServerLimit defaults to 256 and also limits this) with 1GB of RAM on my site. I've also now disabled cookies with "php_flag session.use_cookies off" in .htaccess where that PHP script lives, since that was pointless too. Half the trouble with modern computer software is knowing what you should disable - I also wonder if I could avoid having nf_conntrack enabled, but it's hard to understand exactly how it's used. - The Free Thinker