NOTE: This is a work in progress. Until it's finalized, this is NOT the
official specification.
# Project Gemini
## Speculative specification
V0.15.0, March 5th, 2021
# 5 The text/gemini media type
## 5.1 Overview
In the same sense that HTML is the "native" response format of HTTP and
plain text is the native response format of gopher, Gemini defines its own
native response format - though of course, thanks to the inclusion of a MIME
type in the response header Gemini can be used to serve plain text, rich
text, HTML, Markdown, LaTeX, etc.
Response bodies of type "text/gemini" are a kind of lightweight hypertext
format, which takes inspiration from gophermaps and from Markdown. The
format permits richer typographic possibilities than the plain text of
Gopher, but remains extremely easy to parse. The format is line-oriented,
and a satisfactory rendering can be achieved with a single pass of a
document, processing each line independently. As per gopher, links can only
be displayed one per line, encouraging neat, list-like structure.
Similar to how the two-digit Gemini status codes were designed so that
simple clients can function correctly while ignoring the second digit, the
text/gemini format has been designed so that simple clients can ignore the
more advanced features and still remain very usable.
## 5.2 Parameters
As a subtype of the top-level media type "text", "text/gemini" inherits the
"charset" parameter defined in RFC 2046. However, as noted in 3.3, the
default value of "charset" is "UTF-8" for "text" content transferred via
Gemini.
A single additional parameter specific to the "text/gemini" subtype is
defined: the "lang" parameter. The value of "lang" denotes the natural
language or language(s) in which the textual content of a "text/gemini"
document is written. The presence of the "lang" parameter is optional.
When the "lang" parameter is present, its interpretation is defined entirely
by the client. For example, clients which use text-to-speech technology to
make Gemini content accessible to visually impaired users may use the value
of "lang" to improve pronunciation of content. When the "lang" parameter is
not present, no default value should be assumed and clients which require
some notion of a language in order to process the content (such as
text-to-speech screen readers) should rely on user-input to determine how to
proceed in the absence of a "lang" parameter.
Valid values for the "lang" parameter are comma-separated lists of one or
more language tags as defined in RFC4646. For example:
* "text/gemini; lang=en" Denotes a text/gemini document written in English
* "text/gemini; lang=fr" Denotes a text/gemini document written in French
* "text/gemini; lang=en,fr" Denotes a text/gemini document written in a
mixture of English and French
* "text/gemini; lang=de-CH" Denotes a text/gemini document written in Swiss
German
* "text/gemini; lang=sr-Cyrl" Denotes a text/gemini document written in
Serbian using the Cyrllic script
* "text/gemini; lang=zh-Hans-CN" Denotes a text/gemini document written in
Chinese using the Simplified script as used in mainland China
## 5.3 Line-orientation
As mentioned, the text/gemini format is line-oriented. Each line of a
text/gemini document has a single "line type". It is possible to
unambiguously determine a line's type purely by inspecting its first three
characters. A line's type determines the manner in which it should be
presented to the user. Any details of presentation or rendering associated
with a particular line type are strictly limited in scope to that individual
line.
There are 7 different line types in total. However, a fully functional and
specification compliant Gemini client need only recognise and handle 4 of
them - these are the "core line types", (see 5.4). Advanced clients can
also handle the additional "advanced line types" (see 5.5). Simple clients
can treat all advanced line types as equivalent to one of the core line
types and still offer an adequate user experience.
## 5.4 Core line types
The four core line types are:
### 5.4.1 Text lines
Text lines are the most fundamental line type - any line which does not
match the definition of another line type defined below defaults to being a
text line. The majority of lines in a typical text/gemini document will be
text lines.
Text lines should be presented to the user, after being wrapped to the
appropriate width for the client's viewport (see below). Text lines may be
presented to the user in a visually pleasing manner for general reading, the
precise meaning of which is at the client's discretion. For example,
variable width fonts may be used, spacing may be normalised, with spaces
between sentences being made wider than spacing between words, and other
such typographical niceties may be applied. Clients may permit users to
customise the appearance of text lines by altering the font, font size, text
and background colour, etc. Authors should not expect to exercise any
control over the precise rendering of their text lines, only of their actual
textual content. Content such as ASCII art, computer source code, etc.
which may appear incorrectly when treated as such should be enclosed between
preformatting toggle lines (see 5.4.3).
Blank lines are instances of text lines and have no special meaning. They
should be rendered individually as vertical blank space each time they
occur. In this way they are analogous to
tags in HTML. Consecutive
blank lines should NOT be collapsed into a fewer blank lines. Note also
that consecutive non-blank text lines do not form any kind of coherent unit
or block such as a "paragraph": all text lines are independent entities.
Text lines which are longer than can fit on a client's display device SHOULD
be "wrapped" to fit, i.e. long lines should be split (ideally at whitespace
or at hyphens) into multiple consecutive lines of a device-appropriate
width. This wrapping is applied to each line of text independently.
Multiple consecutive lines which are shorter than the client's display
device MUST NOT be combined into fewer, longer lines.
In order to take full advantage of this method of text formatting, authors
of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width,
in contrast to the convention in Gopherspace where text is typically wrapped
at 80 characters or fewer. Instead, text which should be displayed as a
contiguous block should be written as a single long line. Most text editors
can be configured to "soft-wrap", i.e. to write this kind of file while
displaying the long lines wrapped at word boundaries to fit the author's
display device.
Authors who insist on hard-wrapping their content MUST be aware that the
content will display neatly on clients whose display device is as wide as
the hard-wrapped length or wider, but will appear with irregular line widths
on narrower clients.
### 5.4.2 Link lines
Lines beginning with the two characters "=>" are link lines, which have the
following syntax:
```
=>[][]
```
where:
* is any non-zero number of consecutive spaces or tabs
* Square brackets indicate that the enclosed content is optional.
* is a URL, which may be absolute or relative.
All the following examples are valid link lines:
```
=> gemini://example.org/
=> gemini://example.org/ An example link
=> gemini://example.org/foo Another example link at the same host
=> foo/bar/baz.txt A relative link
=> gopher://example.org:70/1 A gopher link
```
URLs in link lines must have reserved characters and spaces percent-encoded
as per RFC 3986.
Note that link URLs may have schemes other than gemini. This means that
Gemini documents can simply and elegantly link to documents hosted via other
protocols, unlike gophermaps which can only link to non-gopher content via a
non-standard adaptation of the `h` item-type.
Clients can present links to users in whatever fashion the client author
wishes, however clients MUST NOT automatically make any network connections
as part of displaying links whose scheme corresponds to a network protocol
(e.g. links beginning with gemini://, gopher://, https://, ftp:// , etc.).
### 5.4.3 Preformatting toggle lines
Any line whose first three characters are "```" (i.e. three consecutive
back ticks with no leading whitespace) are preformatted toggle lines. These
lines should NOT be included in the rendered output shown to the user.
Instead, these lines toggle the parser between preformatted mode being "on"
or "off". Preformatted mode should be "off" at the beginning of a document.
The current status of preformatted mode is the only internal state a parser
is required to maintain. When preformatted mode is "on", the usual rules
for identifying line types are suspended, and all lines should be identified
as preformatted text lines (see 5.4.4).
Preformatting toggle lines can be thought of as analogous to and
tags in HTML.
Any text following the leading "```" of a preformat toggle line which
toggles preformatted mode on MAY be interpreted by the client as "alt text"
pertaining to the preformatted text lines which follow the toggle line. Use
of alt text is at the client's discretion, and simple clients may ignore it.
Alt text is recommended for ASCII art or similar non-textual content which,
for example, cannot be meaningfully understood when rendered through a
screen reader or usefully indexed by a search engine. Alt text may also be
used for computer source code to identify the programming language which
advanced clients may use for syntax highlighting.
Any text following the leading "```" of a preformat toggle line which
toggles preformatted mode off MUST be ignored by clients.
### 5.4.4 Preformatted text lines
Preformatted text lines should be presented to the user in a "neutral",
monowidth font without any alteration to whitespace or stylistic
enhancements. Graphical clients should use scrolling mechanisms to present
preformatted text lines which are longer than the client viewport, in
preference to wrapping. In displaying preformatted text lines, clients
should keep in mind applications like ASCII art and computer source code: in
particular, source code in languages with significant whitespace (e.g.
Python) should be able to be copied and pasted from the client into a file
and interpreted/compiled without any problems arising from the client's
manner of displaying them.
## 5.5 Advanced line types
The following advanced line types MAY be recognised by advanced clients.
Simple clients may treat them all as text lines as per 5.4.1 without any
loss of essential function.
### 5.5.1 Heading lines
Lines beginning with "#" are heading lines. Heading lines consist of one,
two or three consecutive "#" characters, followed by optional whitespace,
followed by heading text. The number of # characters indicates the "level"
of header; #, ## and ### can be thought of as analogous to , and
in HTML.
Heading text should be presented to the user, and clients MAY use special
formatting, e.g. a larger or bold font, to indicate its status as a header
(simple clients may simply print the line, including its leading #s, without
any styling at all). However, the main motivation for the definition of
heading lines is not stylistic but to provide a machine-readable
representation of the internal structure of the document. Advanced clients
can use this information to, e.g. display an automatically generated and
hierarchically formatted "table of contents" for a long document in a
side-pane, allowing users to easily jump to specific sections without
excessive scrolling. CMS-style tools automatically generating menus or
Atom/RSS feeds for a directory of text/gemini files can use first heading in
the file as a human-friendly title.
### 5.5.2 Unordered list items
Lines beginning with "* " are unordered list items. This line type exists
purely for stylistic reasons. The * may be replaced in advanced clients by
a bullet symbol. Any text after the "* " should be presented to the user as
if it were a text line, i.e. wrapped to fit the viewport and formatted
"nicely". Advanced clients can take the space of the bullet symbol into
account when wrapping long list items to ensure that all lines of text
corresponding to the item are offset an equal distance from the edge of
the screen.
### 5.5.3 Quote lines
Lines beginning with ">" are quote lines. This line type exists so that
advanced clients may use distinct styling to convey to readers the important
semantic information that certain text is being quoted from an external
source. For example, when wrapping long lines to the the viewport, each
resultant line may have a ">" symbol placed at the front.