2026-01-11 Xobaque imports Sitemaps =================================== If you have a small number of sites you want to cover with your own search engine, I might have you covered. Xobaque is a search engine based on Sqlite. The most useful part of it is that it doesn't crawl the web. So how does it index the pages? You feed it with a list of feeds (RSS, Atom, JSON), or with a list of OPML files (containing feeds), or a list of Sitemaps (linking to every page directly). I have been running three instances for a while, now. The Emacs database is the smallest one with just 27 MB. The indieblog.page database is 709 MB. The largest one is the RPG database with 769 MB. So, how small is 'a small number of sites'? The indieblog.page search index covers 6429 domains and 273,721 pages; the RPG search index covers 436 domains and 147'981 pages. 😬 Xobaque knows how to handle paginated feeds as specified in RFC 5005. All the blogs hosted by Google via Blogspot and Blogger support this. That means the search engine can go through the feed following the links to the next feed page until it has ingested the whole site -- as long as the whole site shows up in the paginated feed at some point. Xobaque knows how to handle sitemaps as specified on the Sitemaps site . This is document that links to all the pages on a site. All the blogs hosted by Google via Blogspot and Blogger support these Sitemaps. The drawback from my point of view is that the search engine then has to request every single one of these pages individually. A feed or a paginated feed is better since you're getting ten pages or more per request. Xobaque follows the directives in a /robots.txt page as specified in RFC 9309 follows the non-standard Crawl-Delay directive. Yay! #Xobaque #Search