2025-08-15 An opt-in search engine: Xobaque =========================================== I've been writing a search engine that doesn't crawl the web. I called it Xobaque. On 2025-08-22 I set it up to search Planet Emacslife. Xobaque is entirely opt-in and push based: web admins have total control over what they want to have indexed. The drawback is that web admins have to upload their pages! I foresee this to be a solution for sites that don't have good local search, like Emacs Wiki, or communities, like all the blogs on Planet Emacslife. If we want to rely less on big search engines, we have to provide our own. Technically, Xobaque is writen on top of the SQLite FTS5 Extension. This extension does all the heavy lifting: indexing, searching, ranking, highlighting, snippets, boolean operators. Boolean operators! Do you remember those? Good times. The extension is quite amazing, really. What Xobaque provides is the command-line interface and the web interface. Right now, the command-line interface has a command to initialize an empty database and a command to upload a single page. Here are some examples of how to use this in a script. My homepage: #!/usr/bin/fish cd /home/alex/alexschroeder.ch/wiki for line in (/home/oddmu/oddmu list) set --local info (string split " " $line) set --local name (string replace --regex '^/home/alex/alexschroeder.ch/wiki/' '' $info[1] | string replace --regex '.md' '') set --local title $info[2] /home/xobaque/xobaque upload \ -db "/home/xobaque/index.db" \ -base "https://alexschroeder.ch/view/" \ -local "$name" \ -title "$title" \ -filename "$name.md" end Emacs Wiki: #!/usr/bin/fish for path in (grep --files-without-match '^#FILE ' /home/alex/emacswiki/git/* /home/alex/emacswiki/git/.*) set --local name (string replace '/home/alex/emacswiki/git/' '' $path) set --local title (string replace '_' ' ' $name) /home/xobaque/xobaque upload \ -db "/home/xobaque/index.db" \ -base "https://www.emacswiki.org/emacs/" \ -local "$name" \ -title "$title" \ -filename "/home/alex/emacswiki/git/$name" end The alternative to uploading via the command-line, I think, will be feed ingestion. Web site owners can send their feeds to Xobaque and it will index the entries in the feed. If the entries have the full text, that's great. If they have an excerpt, that works, too. You opt-in with whatever you provide. I'm assuming that this will be a system for friends only, so the upload will be protected by a login. Otherwise, people will be able to upload other people's sites and that violates the opt-in idea. #Search #Web #Xobaque 2025-08-15. I have that vague idea that it would process OPML files like the Planet Emacslife OPML or the three RPG Planet OPML files, Indie, OSR and Other. From the OPML files Xobaque would know a bunch of feeds which it could then use for updates. I have written Planet Jupiter, so I know that from 80% to 90% is handling blogs that are down and from 90% to 100% it's all about fiddling with the markup… There's also the question of consent. So perhaps one would have to start a new list of feeds for those people who explicitly consent. And if by chance that covers all the feeds on Planet Emacslife or RPG Planet that'd be a reason to laugh and party! 🎉 2025-08-17. Adding feeds and collections of feeds (via OPML files) works, now. 🎉 The feed import from the net should use ETags and If-Modified-Since headers if possible, but the convenience functions of bot the feed and the OPML library don’t provide this, so I’ll have to roll my own. And then think of parallelising the requests. Go channels ahoy! 😬 And then a web UI for uploads and accounts? 🥺 2025-08-19. Xobaque now knows how to skip feeds that haven’t updated using If-Modified-Since and If-None-Match headers. 2025-08-19. The errors I'm finding in the feeds of Planet Emacslife OPML: Error importing feed https://lifeofpenguin.blogspot.com/feeds/posts/default?alt=rss: XML syntax error on line 1: illegal character code U+FFFF Error importing feed https://blog.laurentcharignon.com/index.xml: XML syntax error on line 231: illegal character code U+0008 Error importing feed http://feeds.feedburner.com/emacslife: parsing time "Sat, 9 Aug 2025 18:00:43 GMT" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "9 Aug 2025 18:00:43 GMT" as "02" Error importing feed https://www.unwoundstack.com/emacs-rss.xml: Get "https://www.unwoundstack.com/emacs-rss.xml": net/http: TLS handshake timeout Error importing feed http://emacsmovies.org/atom.xml: parsing time "Sun, 04 Dec 2022 08:19:18 8DecGMT" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "8DecGMT" as "MST" But I guess I have 248/253 feeds imported. Other issues I have noticed: Endless repeats of news items that overwrite each other, only differing in title and description, and no GUID set: Update http://tromey.com/elpa/news.html Then again, the items all have dates set from 2007 to 2010, so perhaps that's OK. And now I find: # ./xobaque search alex schroeder Emacs: Take Two https://takeonrules.com/2025/06/10/emacs-take-two/emacs-take-two extend Emacs .
Third, learning of Alex Schroeder, both involved in RPGs It works! And I really need to strip HTML, here! 😂