# Announcing Trunkless, a New Cut-Up Poetry Interface

_published Sun Aug 18 20:26:50 UTC 2024_

Today I am releasing Trunkless[0], a new web based soft ware for creating cut-up[1] poetry from 1,267,620,693 possible phrases extracted from large quantities of thematic text.

a screenshot of Trunkless showing a cut-up poem[16]

At launch, I'm offering five corpora to play with:

- `gutenberg`, the entire english text of Project Gutenberg[2].
- `geocities`, website plaintext extracted from the Archive Team geocities archive[3]
- `gamefaqs`, the text of 20 years worth of plaintext video game walkthroughs from gamefaqs.com[4]
- `cyberpunk`, the text of 31 cyberpunk novels (as used in my collection Cyberpunk Prophecies[5])
- `lovecraft`, the text of a bunch of lovecraft books[6]

For now you can only work within one corpus at a time. I'm open to adding new corpora. Send me an email or open an issue on github[7] to chat about it. You can also, of course, get Trunkless's source[8] locally and add your own corpora.

## Background

I left high school in the 2000s with the fervent desire to be a writer. Fear of poverty and a sense of alienation meant I was thinking about how to support myself from the first moment I picked which college classes to take. Because I had expressed interest in some amount of computer science study, I had been assigned a computer science professor as my advisor. I remember going over the list of classes and nervously signing up for every computer class I could. Even my CS advisor questioned my logic. I brought this fear of eventual employability into every class registration meeting and he brought a soft resistance that stood no chance. I contend that _some_ amount of this fear is healthy for anyone attending a liberal arts college. I, however, regret my excess.

For my senior year of college I did a thesis. Despite my nervous insistence I take all the computer science classes I could take, I still rebelled against the idea of being a programmer and even a computer scientist. I did not want to do a "traditional" computer science thesis. I wanted to "make art," whatever that meant. I did not want my thesis to wear a tie (which was something I didn't yet know programmers didn't do).

William S. Burroughs was a great inspiration to me throughout high school. His cut-up technique lit up my brain and led me to appreciate aleatoric art in general. My brain didn't work so well[9] and applying chance to the creative process was a way to get somewhere with art when my mind wasn't cooperating.

Throughout college I was a performing noise musician. I did a lot more composition than might have been obvious at my shows, but within the structure of a performed work I did a lot of improvisation. I followed the lead of feedback and samples and the random slamming of effect pedals. I had been making bots on this early "twitter" thing that smashed search results for random words together, slicing up text without reason. I used text manipulation to generate flyers for my campus radio show, DEAD AIR. All of this led me to my thesis proposal: what if William S. Burroughs had not scissors, paper, and paste--but a computer?

I remain proud of my thesis these many years later, arrogant and smug though it was. The code was clumsy and full of bad decisions. Still, I used it to create some pieces I liked. I presented my thesis (it's on youtube sliced up into 10 minute chunks as all youtube videos were at the time. I will leave finding it as an exercise to the reader).

I kept coming back to this software throughout the years. This used to embarrass me. Only a loser keeps making the same work of art over and over, I told myself. Why not make new, radical things? As I matured I noticed several examples of artists I deeply respected seeming to iterate on the same basic piece over and over. William Gibson is a good example: each of his novels is different, but think about them enough and you'll find the same essential themes. In the 1980s Gibson took a big bite of some meaty ideas and today he is yet chewing on them. We are lucky to snuffle out the crumbs that fall from his wit-wet lips.

I transmuted my shame into pride and, thus, now present to you the latest iteration of the cut-up poetry software I have been chewing on since 2009: Trunkless.

The stages of growth that got us here:

- Weltanschauung, 2009 (perl)
- node-prosaic, 2012 (coffeescript, though a custom dialect inspired by haskell. yes, it's completely terrible)
- hy-prosaic, 2014 (hy, with a code structure inspired by the cthulhu mythos. i can't explain this now)
- Prosaic, 2015 (python)
- prosaic.party, 2015 (a web based version of prosaic. it suffered from a terrible ux, memory leaks, extremely slow performance)
- Trunkless, 2024 (go/javascript)

I did poetic work with each iteration. By Save Scum[10] I had a consistent workflow of generating a dozen or so lines, copying and pasting the lines that stood out to me, writing some original "glue" lines, then generating new lines and looking for the final pieces to whatever puzzle was taking shape. Prior to Trunkless, I was fixated on the idea of "decaying templates" or "weakening rules" which I felt set my work apart from previous computer poetry tools. A user supplied prosaic with a line template like `{"keywords":["sex", "love", "god"], "alliteration": true, "rhyme": "A"}`. Prosaic would search its database for phrases that exactly matched the template. If a match was not found, the template was "weakened" and re-run. To weaken a rule, prosaic would pick a random property and make it crappier: looser rhymes would be accepted or alliteration would be dropped, for example. Rules would be weakened until a line was found. During Save Scum, however, I found myself just using the "blank template" to get a tr
uly randomized phrase. I would see a connective thread between two random lines and build a poetic narrative or picture around it. 

This process felt like sculpting and I imagined myself at a pottery wheel, feeling the bumps on each random line and molding them into a coherent shape. A poetry wheel, if you will indulge my metaphor.

## Trunkless's interface

Trunkless is a radical departure from `prosaic` and an attempt to crystallize the approach at which I arrived after working in this mode for well over a decade. Trunkless is also an attempt to share my approach with others in a low friction way. I have completely dropped the idea of templates: now you just get one random line out of a truly massive corpus. I've focused instead on the human-interactive parts of my workflow. Trunkless is the poetry wheel: sit, sculpt, share.

To get started with Trunkless, note what corpus you are working with. It's `gutenberg` by default. You can pick a different one and hit `go` to switch. See what lines you received from the æther. Each line's source identifier is on the right side under the row of buttons. If you see lines you like, you can click the padlock button to lock it in place; it will enbolden. You can regenerate unwanted lines either using the per-line regenerate button or by clicking "regenerate lines" at the top; any unlocked line will be replaced.

If you want to trim a line's content, click its `edit` button. Once you are happy with changes, hit your `enter` key or click the edit button again. Note that if the new version of the line is not a proper substring of the original line, the source attribution changes to _original_.

Delete lines with the `delete` button. Add new lines at the end with `add new line`. Any unlocked line can be reordered by clicking a line and dragging it to a new position.

When you have a piece you want to share or take elsewhere to edit further, you have several options:

- copy to your clipboard as an image
- copy to your clipboard as plain text
- save as an image to your computer
- save as a `.txt` file to your computer

In any of those scenarios you can include a listing of source attribution with the `include sources` checkbox.

## Technical Details

I used Go for all of the text extraction and for the web backend. The frontend is vanilla Javascript using web components. The only library I included on the front end is the very useful html2canvas[11], a tool I also reached for in the blackout engine.

Trunkless is written as a Go CLI application that exposes a few subcommands:

- `cutup`: cut-up raw plaintext files into files full of poetry ready phrases
- `ingest`: use files full of poetry ready phrases to create a corpus in the database
- `serve`: run the web frontend

I began this work with the requirement that a random line should always be retrievable in constant time. This was to allow me to embrace very large corpora and also make the "poetry wheel" approach to cut-up as pleasant as possible. There is no "get a random row" operation in PostgreSQL, so I evaluated two options:

- pre-generate an indexed, random value for every phrase row then query for a row closest to a randomly generated value over the same range
- give every row a sequential integer ID, cache the maximum ID value, then generate a random integer between 0 and the maximum ID value

I went with the latter approach. I typically would never rely on the assumption of gapless IDs but I was not building a traditional database schema. Corpora are synthesized; once generated, they are read only. I would only ever be regenerating them from scratch. I felt comfortable relying on gapless, sequential IDs and the approach worked quite well.

A major challenge was optimizing write speed when writing a corpus. I wrote sufficiently fast Go code to cut up and produce phrases from various sources but was getting bottlenecked when trying to insert all of the phrases into a table. This challenge led me to abandon the database I started with, sqlite3, and switch to PostgreSQL in order to take advantage of the `COPY` operation which can achieve extremely fast table write speeds.

The basic unit of input for Trunkless is a plaintext file. The `cutup` subcommand is pointed at a directory of plaintext files. By default, filenames are used as source identifiers. Every source's identifier is hashed to use as a primary key in the database. `cutup` does no database operations but instead outputs phrases cut from the source files to a new file of phrases per source. This allows for the cutup phrases to be QA'd manually or used for other purposes. Each phrase file is a tsv with its source identifier's hash and the resulting phrase. A supplmental file, `_title_index.tsv`, is also written for easy translation between the source identifier hash and source's plaintext title while performing ingestion.

The `ingest` subcommand creates a corpus from the pile of phrase files output by `cutup`. Each phrase file is inserted via `COPY`, allowing postgresql to write entire files at once into the corpus's phrase table. It's worth noting here that I commit another grave database sin. In the "real" world I would have three tables: phrases, sources, and corpora. Phrase rows would have foreign keys to source rows and source rows would have foreign keys to corpora rows. In my extensively pathological usecase, however, I can't afford to do joining or queries more complicated than `SELECT phrase FROM phrases WHERE id = 123`. So, how do I know which phrases belong to which corpora? I hash the corpus name as part of `ingest` and create a phrase table for it with the hash as part of the table name. Then, when querying, I _STRING INTERPOLATE_ the corpus name hash to select from its phrase table (yes, I defend against SQL injection). Judge me; I will cackle, just as I did after describing all of this to my software engineer w
ife one day.

"It's fine," I insisted.

"It's definitely not fine," she said. She's right, of course, but this is art. Art shouldn't be fine.

On the front end I have committed many more crimes which I will not enumerate here. It works well enough and I'm quite pleased with web components. I would have used vanilla JS either way since I think React is worse than MUMPS[12].

I'm running PostgreSQL in a very sqlite3 flavored way thanks to this guide a friend contributed to another one of my projects: Run a Little Postgres Without sudo[13].

## Corpora 

Initially, Trunkless was going to exclusively be based on Project Gutenberg just like the blackout engine. While working on the blackout engine, I was not able to get the entirety of Project Gutenberg's English holdings and this gnawed at me (despite saying in the blackout engine's announcement post that I would not let such things gnaw at me). Part of getting Trunkless going, thus, was once again throwing myself at getting the full Gutenberg corpus. I succeeded, though not without effort. I found Gutenberg's catalog file: a massive CSV that lists every book in their collection. Sadly, this CSV has corruption that makes many (~10,000) rows unreadable. I composed `bash` one liners until I cleaned up all the issues and uploaded a corrected CSV to the Internet Archive[14]. Using this corrected catalog, I was able to use a mirror of Gutenberg to fetch all of the English language books. This resulted in a pile of 57,000 books. I've compiled and uploaded those files[15] to the Archive as well. I hope this is valua
ble work for others to build on. There are other, similar compilations floating around out there but the most recent ones I found were several years out of date.

I cut all of Gutenberg up into a sqlite3 database and let tilde.town users play with an early Trunkless. It was well received. I really wanted additional corpora, however, and that desire led to a lot of the changes described above. I'm glad I did that work. Adding new corpora is now fast and easy. Some source files need more handling than others and the `cutup` subcommand accepts a `--flavor` flag to apply source-specific hacks like filtering out Gutenberg book headers and footers.

## So

Exactly one year ago[16] I mentioned via mastodon that I was breaking ground on a new poetry project. It's satisfying to deliver on that given how in the past year some significant things happened in my life: departing github after five years, experiencing the unexpected death of my mother, moving across the country, starting a new job. This work had a lot of false starts and dead ends. I was going to simultaneously release a CLI version. I was going to allow custom corpora uploads and a public feed of cut-up poems. I wanted people to be able to mix and match sources to fine tune corpora and change the weighting of various sources. I wanted people to be able to pick a cute, custom frame and color scheme for their finished poems.

I didn't do most of what I wanted. Given the year I've had I think that's okay. I used to not finish creative projects because of all the things I felt I couldn't get to. I realized that if you release a work of art to the world, the things you didn't get to are only known to you. People can't see what you didn't do; they see what you did do. I hope you like what I did.


h[0]: Trunkless	https://trunkless.tilde.town
h[1]: cut-up	https://en.wikipedia.org/wiki/Cut-up_technique
h[2]: Project Gutenberg	https://gutenberg.org
h[3]: Archive Team geocities archive	https://wiki.archiveteam.org/index.php/GeoCities
h[4]: gamefaqs.com	https://gamefaqs.com
h[5]: Cyberpunk Prophecies	https://archive.org/details/cbprop
h[6]: lovecraft books	https://github.com/vilmibm/lovecraftcorpus/
h[7]: on github	https://github.com/vilmibm/trunkless/issues
h[8]: source	https://github.com/vilmibm/trunkless
h[9]: didn't work so well	https://tilde.town/~vilmibm/blog/#walk-in-dark
h[10]: Save Scum	https://archive.org/details/savescum
h[11]: html2canvas	https://github.com/niklasvh/html2canvas
h[12]: MUMPS	https://en.wikipedia.org/wiki/MUMPS
h[13]: Run a Little Postgres Without sudo	https://github.com/vilmibm/tildemush/blob/master/a_little_postgres.md
h[14]: the Internet Archive	https://archive.org/details/pg_catalog
h[15]: those files	https://archive.org/details/pg_eng_txt_2024
h[16]: one year ago	https://tiny.tilde.website/@vilmibm/110912320531170491
p[16]: a screenshot of Trunkless showing a cut-up poem	https://tilde.town/~vilmibm/images/trunkless.png