2025-06-21 Trying to understand the bots ======================================== I changed the limit of my automatic ASN ban from 1000 hits in 2h to 500 hits in 2h. That's because the two biggest autonomous systems hitting my sites are from Vietnam and China and they're currently keeping below that 1000 hits per 2h limit: site-log !^social | log-ip | asncounter --no-prefixes 2>/dev/null count percent ASN AS 749 4.37 45899 VNPT-AS-VN VNPT Corp, VN 539 3.15 45102 ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN 448 2.61 24940 HETZNER-AS, DE 427 2.49 16276 OVH, FR 367 2.14 28573 Claro NXT Telecomunicacoes Ltda, BR 331 1.93 9009 M247, RO 312 1.82 62610 ZEN-DPS, US 303 1.77 7922 COMCAST-7922, US 299 1.74 212238 CDNEXT, GB 264 1.54 7018 ATT-INTERNET4, US total: 17135 The numbers themselves are not that big, but I am annoyed. I live in an English/German world and I don't see a reason for service providers from Vietnam, China, Brazil and Romania crawling my sites. (You can find all the fish functions I use in the admin directory.) Let's take a look at what they are requesting! The Vietnamese bots: site-log !^social | asn-access-log 45899 | log-request | rank-lines 74 /nobots 3 /emacs/?action=translate%3Bid%3Dmon-utils.el%3Bmissing%3Dde_es_fr_it_ja_ko_pt_ru_se_uk_zh 3 /emacs/?action=translate%3Bid%3DComments_on_SuperCollider%3Bmissing%3Dde_en_es_fr_it_ja_ko_pt_ru_se_uk_zh 3 /emacs/?action=translate%3Bid%3DComintModes%3Bmissing%3Dde_es_fr_it_ja_ko_pt_ru_se_uk_zh 3 /emacs/?action=edit%3Bid%3DCustomizeNewGUI 3 /emacs/?action=browse%3Bdiff%3D2%3Bid%3DDialog 3 /emacs/?action=admin%3Bid%3DApplyingPatches 2 /emacs/wang1zhen 2 /emacs/Comments_on_zenburn.el 2 /emacs?action=translate%3Bid%3DVbsReplMode%3Bmissing%3Dde_en_es_fr_it_ja_ko_pt_ru_se_uk_zh Looks like they're following all the links, so a misbehaved bot, if you ask me. They're "hitting all the buttons" on the web app. The relevant part of robots.txt: User-agent: * Crawl-delay: 240 Disallow: /emacs?action= The Chinese bots: site-log !^social | asn-access-log 45102 | log-request | rank-lines 306 /robots.txt 12 /wiki?action=rc%3Brcfilteronly%3D%222005-10-06%22 12 /wiki?action=history%3Bid%3D2005-10-06 12 /wiki?action=edit%3Bid%3DMoneyPooling 12 /wiki?action=edit%3Bid%3DDorfWiki 12 /wiki?action=edit%3Bid%3DBarnstarSharing 12 /wiki?action=edit%3Bid%3D2005-10-06 12 /wiki?action=define%3Bname%3DMoneyPooling 12 /wiki?action=define%3Bname%3DBarnstarSharing 12 /wiki?action=browse%3Bdiff%3D2%3Bid%3D2005-10-06 Looks like they're following all the links, so a misbehaved bot as well. Again, the relevant part of robots.txt: User-agent: * Crawl-delay: 240 Disallow: /wiki The German bots actually make reasonable requests: site-log !^social | asn-access-log 24940 | log-request | rank-lines 90 /view/2025-06-16-ban-asn 52 / 50 /view/index 41 /rpg/feed.xml 29 /admin/ban-cidr 16 /view/index.rss 15 /emacs?action=rss 10 /robots.txt 8 /wiki/feed/full/ 8 /osr/feed.xml Let's see what sort of user agents we see. I'm expecting feed readers. site-log !^social | asn-access-log 24940 | log-user-agent | rank-lines 46 NewsBlur Page Fetcher 36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36 32 NewsBlur Feed Fetcher 23 fiperbot/0.1 (+https://www.fiper.net) 21 Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot) 15 AwarioSmartBot/1.0 (+https://awario.com/bots.html; bots@awario.com) 9 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0 8 MyNewspaper Agent 1.0 8 Akkoma 3.9.3-0-g9d7c877; https://social.raccoon.college , Akkoma 3.9.3-0-g9d7c877; https://social.raccoon.college ; Bot 7 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47 The one that stands out is "DataForSeoBot". But it seems that this is not a problem. I already have this bot in my Apache config (as seen on 2025-03-21 A summary of my bot defence systems). Still, booo! Hetzner for hosting this bot. site-log !^social | grep DataForSeoBot | leech-detector Total hits: 21 IP | Hits | Bandw. | Rel. | Interv. | Status ------------------------------:|-----------:|-------:|-----:|--------:|------- 136.243.228.177 | 18 | 1K | 85% | -289.2s | 410 (55%), 301 (44%) 136.243.220.213 | 1 | 0K | 4% | | 301 (100%) 136.243.228.178 | 1 | 0K | 4% | | 301 (100%) 136.243.228.193 | 1 | 3K | 4% | | 410 (100%) The French bots also seem to be reasonable: site-log !^social | asn-access-log 16276 | log-request | rank-lines 153 /view/2025-06-16-ban-asn 96 / 87 /view/index 45 /admin/ban-cidr 17 /robots.txt 17 /files/internet-office-hours.xml 8 /rpg/feed.xml 2 /wiki?action=rss;rcidonly=Page_Synchronization 2 /wiki?action=rss;match=%5EPingback_Server_Extension%24 2 /view/RPG.rss The Brazilian bots seems to download the entire site: site-log !^social | asn-access-log 28573 | log-request | rank-lines 72 /nobots 1 /wiki/Year_of_the_Copper_Titan/Comments_on_Character_sheet_template 1 /wiki/WilderlandsOfSwordsAndDevilry/Comments_on_Cutthroat_Inn_Hooks 1 /wiki/WerdnaWorld?search=%222019-02-17%22 1 /wiki/Waterdeep/Recap_April_18,_2020 1 /wiki/TheRoadToDwimmermount/Comments_on_Alia 1 /wiki/SmoothPointsofPride?action=history;id=Melee 1 /wiki/SmallHuman 1 /wiki?search=%22MicroPayment%22 1 /wiki?search=%22Gemini+Wiki+on+the+Internet%22 Look at the requests: site-log !^social | asn-access-log 28573 | log-request | rank-lines 62 /nobots 1 /wiki/Year_of_the_Copper_Titan/Comments_on_Character_sheet_template 1 /wiki/WonderfulBreadIncrease 1 /wiki/Waterdeep/Recap_February_8,_2020 1 /wiki/Waterdeep/Recap_April_18,_2020 1 /wiki/TheRoadToDwimmermount/Comments_on_Alia 1 /wiki/TheBrokenLands/Comments_on_Symbol_of_Truth 1 /wiki/SmallHuman 1 /wiki?search=%22MyMacros%22 1 /wiki?search=%22ModularWiki%22 Specially those searches at the bottom! The relevant part of robots.txt: User-agent: * Crawl-delay: 20 Disallow: /wiki? Same for the Romanian one: site-log !^social | asn-access-log 9009 | log-request | rank-lines 5 /nobots 3 / 2 /emacs/Comments_on_SiteMap/ 2 /diff/2021-07-29_Creative_projects%2C_perpetually_work_in_progress 2 /cw/2006-04-30 1 /wiki/Waterdeep/Comments_on_imp 1 /wiki/Unter_Piraten/Comments_on_Numqu'am_Solus 1 /wiki/Unter_Piraten/Comments_on_2023-04-07 1 /wiki/Unter_Piraten/Comments_on_2023-03-03 1 /wiki/TravellerTheSalamanderCrew/Comments_on_Yandee And what I really hate are those random user agent strings. site-log !^social | asn-access-log 9009 | log-user-agent | rank-lines 13 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3 12 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 OPR/117.0.0.0 12 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.3 11 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3 11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5 9 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.3 8 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0 8 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.3 6 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1 2 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 Do I really need to go to to the The Ultimate Apache Bad Bot & Referrer Blocker? #ButlerianJihad