# blackout.tilde.town _published Fri Jul 14 04:52:12 UTC 2023_ I made a new thing: a website for making blackout poetry with over nine million chunks of text extracted from Project Gutenberg. It's here at blackout.tilde.town[0] . a screenshot of a blackout poem that reads: the picturesque decay remains an idea of the beautiful[8] Ever since ~kc[1] posted this page[2] I've been inspired by blackout poetry. I wanted an interface not only for doing it, but for giving me novel text to work with as well. I used Project Gutenberg's robot access instructions[3] to get about 12 gigabytes of compressed plaintext English language books. It translated to about 35,000 books once duplicate encodings were ignored. This code[4] , gutchunk, uncompressed the books and combed through them for what i'm calling "chunks." I was looking for meaty sections of text that would make for good blackout poetry fodder. My approach is fairly naive. I store text in a buffer until I see two newlines, then check if I have enough in the buffer; if I do, I cut a chunk. If I don't, I discard it. To my extreme pleasure I ended up with over nine million chunks. This is all sitting in a sqlite3 database on the town and if you're reading this and are also a townie, let me know if you want access to it. When I was working on prosaic[5] over the years I got a lot of junk from my sloppy parsing of gutenberg books. I was young and silly and not writing great code then. I was also afflicted with this perverse need to ingest ALL of the text into my cut-up corpora. I got a lot of cruft: chapter headings, tables of content, captions, and similar. So far I've pulled well over a hundred of my nine million chunks and they all look quite good. My simple heuristic avoided a lot of the noise that I get when running prosaic. Of course, I'm missing some text: short bits of dialogue, for example. This kind of thing would have haunted me in the past, but now knowing that mystery remains in these books feels good. I don't like finding the bottom of the swamp[6] . If you're interested, the code for blackout.tilde.town is also up on [our gitea](https://git.tilde.town/vilmibm/blackout) . There is no way to iterate over the chunks; you get a random one every single page load. Given the size of the ID space, this should mean an infinitesimally small chance for repeats. I wanted an experience like the library of babel[7] ; one of wandering and digging up scraps to scrawl upon. I'm hosting this decidedly personal project on tilde.town because I felt like it was a nice fit for our community. It's also my house and I can do whatever, though I try not to have that mindset too often. I may also make an SSH-hosted text-mode version. I haven't decided. I've already been really pleased with the experience of making poems using the new site and hope you like it, too. Please let me know on mastodon[8] or wherever if you're making stuff with it. h[0]: blackout.tilde.town https://blackout.tilde.town h[1]: ~kc https://tilde.town/~kc h[2]: this page https://tilde.town/~kc/blackout h[3]: robot access instructions https://gutenberg.org/policy/robot_access.html h[4]: This code https://git.tilde.town/vilmibm/gutchunk h[5]: prosaic https://github.com/vilmibm/prosaic h[6]: the bottom of the swamp https://tilde.town/~vilmibm/swamp h[7]: the library of babel https://en.wikipedia.org/wiki/The_Library_of_Babel h[8]: mastodon https://tiny.tilde.website/@vilmibm p[8]: a screenshot of a blackout poem that reads: the picturesque decay remains an idea of the beautiful https://tilde.town/~vilmibm/blackout.png