A Body of Five-Letter Words I've been made aware of a fun little game by the name of ``Wordle'' lately, a game whose simple goal is to guess a five-letter word within six tries. I get the impression five letters were chosen over six due to most popular slurs using six letters. I fairly easily recognized this game to be another for which a machine will always best a man, trivially; a man will sit and think, using techniques to whittle down the word list based on common words, and following high-level patterns in the language. A machine needs no special knowledge of the language, perhaps not even of the words therein, but use of a dictionary holding all five-letter words would help nonetheless, and the set of such words is a very small set, not even one dozen million possible such words, and the true set much smaller still. That source code for this game has leaked, making it trivial to get the list of words, but I thought it would be fun to collect them by myself regardless. Building such a set is the first step towards solving such a game. I wrote a small Ada program for the purpose, as Ada's arrays made it extremely easy: Such a small set needn't be stored sparsely and explicitly, but can be represented densely and implicitly as a five-dimensional array holding boolean values in which each index is a letter range. Once all words are given, the array can be scanned and any true values yield the corresponding word. It's so strange to see where Latin words end up, and the use of CORPUS to refer to any collection of data is one such oddity; using my writing as a corpus would be insufficient, and so I used this one: https://www.anc.org/OANC/OANC_GrAF.zip I was wholly uninterested in complex use of this corpus, and so only extracted the basic text files. Manipulating ASCII text is one of the few tasks for which modern UNIX is actually halfway useful, so I used the following ``pipeline'' to prepare the multitude of files for the ease of the Ada program: LC_ALL=C find ~/OANC-GrAF/ -type f -exec cat {} + | tr -c '[:alpha:]' '[\n*]' | ./corpus /dev/stdin five-letter-words The ``find'' command, as odd as it looks, turns the multitude into a single stream, with some things done for reasons of efficiency; the ``tr'' command turns all non-alphabetic characters into newlines before passing them to the Ada program; and the Ada program fills in its data structure until all is given, after which time it creates a file holding all words, stored in exactly five characters each. I separated the input words onto lines, because the Ada.Text_IO.Get_Line function from Ada 2005 made it easiest to collect the input in this way. The translation done by the ``tr'' command ensures any issues with the corpus can't become invalid words, although one glance showed to me the corpus is of questionable quality, in any case. Fortunately, a corpus can be improved with basic set operations. .