2024-11-02 - No YAML, No Recutils
=================================

Back around 2011 i wrote a private database, and at some point
migrated it to a file based format.  Each record is a YAML file.
The data is presented in a vertical format with one line per field,
plus some multi-line blocks defined by indentation.  Trivial to
edit in any text editor.  I used Tcl and the yaml module from libtcl
to process the data.

Recently i found the No YAML web site and i decided it was time
for a change.

No YAML
<https://noyaml.com/>

I briefly considered CSV, TSV, and JSON.  They are all mature,
standardized formats, but in my opinion they fall short when it
comes to editing in a vertical format in a plain text editor.

I looked at GNU Recutils, since several folks wrote about it their
phlogs.  Like my YAML files, recutils presents the data in a vertical
format with one line per field, plus it can do multi-line blocks via
line continuations.  The format is fine for my purposes.

One of my requirements is that i want this to work on FreeDOS too. 
Recutils requires filesystem support for ACL, which is too fancy for
DOS.  The format is simple enough, but the source code is
surprisingly complex.  It does a fraction of what sqlite3 does, and
in a less portable, less robust way.

I tried making my own format based on ASCII control codes.  I could
use Control-^ (the RS character) as the record separator, Control-_
(the US character) as the unit separator AKA the field separator, and
Control-X (the CAN or Cancel character) to discard all text since the
beginning of the field.  This format works in ed(1) and the calvin vi
clone on DOS. However, the control characters are a little ridiculous
to look at and type in.  I did not want to foist such an eye-sore on
my future self.

csvtofsv also uses ASCII control codes as delimiters
<gopher://tccr.it/1/r/csvtofsv>

I tried another format based on something i found online.  Like my
YAML files, this format has one file per record and one line per
field, presented in a vertical format.  The record separator is an
empty line.  The field separator is the EOL (end of the line).  Each
field has a name, a colon character, a space, and optionally a value.
Any line that begins with whitespace is a line continuation from the
previous field.

This format is trivial to process in AWK.  No special parser
required.  I converted my private database to this format.  Exporting
the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds
in the Tcl & YAML version.

I added one feature: inline blocks of multi-line text. The block
format is the same as the line contination format, except the initial
value is a backslash character.

For example, here is a line continuation.

    fieldname: First sentence.
      Second sentence.
      Third sentence.

When this value is read, the EOL and indentation are removed.
That's why this can also be represented without continuation. 

    fieldname: First sentence. Second sentence. Third sentence.

Here is an inline block of multi-line text.

    fieldname: \
      Line 1 of 3.
      Line 2 of 3.
      Line 3 of 3.

When this value is read, the indentation is removed, but the
EOL is preserved.  The value contains multiple lines.

Time for me to shut up and show them the code.  Below are two
small AWK scripts to convert from gopher lawn format to TSV
and back.

lawn2tsv.awk
<gopher://tilde.pink/0/~bencollver/log/
2024-11-02-no-yaml-no-recutils/lawn2tsv.awk>

tsv2lawn.awk
<gopher://tilde.pink/0/~bencollver/log/
2024-11-02-no-yaml-no-recutils/tsv2lawn.awk>

p.s.

In theory, if i wanted to migrate the data, i could use uncsv to
convert between TSV and CSV.  GNU recutils can import and export
CSV.

I was told that the gopher lawn format resembles the Header Fields
format in email standards.  See section 2.2 of RFC 5322.

<gopher://gopher.32kb.net/0/rfc/rfc5322.txt>

The VCARD format is also similar.  See section 6.10 of RFC 6350 for
Extended Properties and Parameters.  I could have abused this format
but i think it is too complex for my purposes.

<gopher://gopher.32kb.net/0/rfc/rfc6350.txt>

tags: bencollver,technical,unix

Tags
====

bencollver
<gopher://tilde.pink/1/~bencollver/log/tag/bencollver/>
technical
<gopher://tilde.pink/1/~bencollver/log/tag/technical/>
unix
<gopher://tilde.pink/1/~bencollver/log/tag/unix/>