URI: 
  TEXT View source
       
       # 2024-11-02 - No YAML, No Recutils
       
       Back around 2011 i wrote a private database, and at some point
       migrated it to a file based format.  Each record is a YAML file.
       The data is presented in a vertical format with one line per field,
       plus some multi-line blocks defined by indentation.  Trivial to
       edit in any text editor.  I used Tcl and the yaml module from libtcl
       to process the data.
       
       Recently i found the No YAML web site and i decided it was time
       for a change.
       
  HTML No YAML
       
       I briefly considered CSV, TSV, and JSON.  They are all mature,
       standardized formats, but in my opinion they fall short when it
       comes to editing in a vertical format in a plain text editor.
       
       I looked at GNU Recutils, since several folks wrote about it their
       phlogs.  Like my YAML files, recutils presents the data in a vertical
       format with one line per field, plus it can do multi-line blocks via
       line continuations.  The format is fine for my purposes.
       
       One of my requirements is that i want this to work on FreeDOS too. 
       Recutils requires filesystem support for ACL, which is too fancy for
       DOS.  The format is simple enough, but the source code is
       surprisingly complex.  It does a fraction of what sqlite3 does, and
       in a less portable, less robust way.
       
       I tried making my own format based on ASCII control codes.  I could
       use Control-^ (the RS character) as the record separator, Control-_
       (the US character) as the unit separator AKA the field separator, and
       Control-X (the CAN or Cancel character) to discard all text since the
       beginning of the field.  This format works in ed(1) and the calvin vi
       clone on DOS. However, the control characters are a little ridiculous
       to look at and type in.  I did not want to foist such an eye-sore on
       my future self.
       
   DIR csvtofsv also uses ASCII control codes as delimiters
       
       I tried the format used by the gopher lawn.  Like my YAML files, the
       gopher lawn database has one file per record and one line per field,
       presented in a vertical format.  The record separator is an empty
       line.  The field separator is the EOL (end of the line).  Each field
       has a name, a colon character, a space, and optionally a value.  Any
       line that begins with whitespace is a line continuation from the
       previous field.
       
   DIR Gopher Lawn database
       
       This format is trivial to process in AWK.  No special parser
       required.  I converted my private database to this format.  Exporting
       the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds
       in the Tcl & YAML version.
       
       I added one feature: inline blocks of multi-line text. The block
       format is the same as the line contination format, except the initial
       value is a backslash character.
       
       For example, here is a line continuation.
       
           fieldname: First sentence.
             Second sentence.
             Third sentence.
       
       When this value is read, the EOL and indentation are removed.
       That's why this can also be represented without continuation. 
       
           fieldname: First sentence. Second sentence. Third sentence.
       
       Here is an inline block of multi-line text.
       
           fieldname: \
             Line 1 of 3.
             Line 2 of 3.
             Line 3 of 3.
       
       When this value is read, the indentation is removed, but the
       EOL is preserved.  The value contains multiple lines.
       
       Time for me to shut up and show them the code.  Below are two
       small AWK scripts to convert from gopher lawn format to TSV
       and back.
       
  TEXT lawn2tsv.awk
       
  TEXT tsv2lawn.awk
       
       p.s.
       
       In theory, if i wanted to migrate the data, i could use uncsv to
       convert between TSV and CSV.  GNU recutils can import and export
       CSV.
       
       I was told that the gopher lawn format resembles the Header Fields
       format in email standards.  See section 2.2 of RFC 5322.
       
  TEXT gopher://gopher.32kb.net/0/rfc/rfc5322.txt
       
       The VCARD format is also similar.  See section 6.10 of RFC 6350 for
       Extended Properties and Parameters.  I could have abused this format
       but i think it is too complex for my purposes.
       
  TEXT gopher://gopher.32kb.net/0/rfc/rfc6350.txt
       
       tags: bencollver,technical,unix
       
       # Tags
       
   DIR bencollver
   DIR technical
   DIR unix