URI: 
  TEXT View source
       
       # 2024-08-13 - EPUB2TXT, Convert EPUB To Plain Text On DOS
       
       Recently I wrote an XML parser in AWK.
       
   DIR gopher://tilde.pink/1/~bencollver/log/2024-08-01-xml2tsv
       
       I let it loose into the black abyss of DOS.  It came back with books
       in a bag of holding.
       
       Now i am announcing epub2txt.awk, a script to convert EPUB files to
       plain text for reading on DOS.  It it relies on a bunch of utilities
       to do this conversion.  The only one i modified was UTF8TOCP.COM from
       FreeDOS.  I changed it to decode unknown Unicode codepoints into
       \uHHHH and \UHHHHHHHH formats instead of just '?'.
       
       It takes patience because the script can run for a long time.
       Once completed, it produces several outputs.
       
       The plaintxt/ directory contains short 8.3 filenames for all of the
       images used in the EPUB; index.txt is UTF-8 encoded; index.dos is
       CP437 encoded, has hard wrapped paragraphs, and has ASCII-art
       transliterations for some Unicode codepoints.
       
       The plaintxt/ directory would be a good one to zip up for viewing
       on any DOS machine.
       
       index.html gives a table of contents to read the EPUB content in
       a web browser.  In FreeDOS i can view this with inline images by
       running:
       
           links -g -mode 1024x768x256 index.html
       
       I tested epub2txt with nawk on FreeDOS 1.3 and MS-DOS 6.22.  I tested
       it with gawk, mawk, and nawk on Slackware 15.  I advise running on
       Unix if possible, then copying plaintxt over to DOS for viewing.
       
       To download the DOS package:
       
   DIR gopher://tilde.pink/1/~bencollver/files/dos386/util/epub2txt/
       
       To view the source code:
       
  HTML https://chiselapp.com/user/bencollver/repository/epub2txt
       
       See the directions in README.TXT and have fun!
       
       * * *
       
       I found a very similar project written in C.  It is very fast.  It
       does not include references to images and external links.  See link
       below for code and details.
       
   DIR gopher://tilde.club/1/~freet/gophhub/?https://github.com/kevinboone/epub2txt2
       
       tags: bencollver,retrocomputing,technical
       
       # Tags
       
   DIR bencollver
   DIR retrocomputing
   DIR technical