Living in Plain Text#

What is Plain Text?#

Plain text designates data that represent only readable characters but not any graphical representation, styling nor other objects like images etc. It includes some whitespace characters that affect simple arrangement of text like spaces, line breaks or tabulation characters.

Plain text files differ significantly from the well-known “Word” files where style information is embedded as binary, non-human-readable, objects: They only contain text, but not its graphical representation or other objects (formatting such as different fonts, font sizes, bold or italics, images, etc.). In principle, they are similar to texts that were written with a typewriter.

Plain text is therefore the most basic concept representing textual information in a digital system. The “lowest common denominator” if you will. Since the introduction of the Unicode UTF-8 encoding it can been seamlessly processed by any modern UTF-8 enabled system (which ist to be considered almost any system by now).

Note

Plain text is a (technically) simple representation of textual and other forms of content.

Plain Text becomes rich: Adding Semmantics#

Text doesn’t exit in a vacuum. While plain text is just a way to store characters, what that characters actually mean needs some kind of definition. Depending on you requirements you might want to add some context to your text files: If you are a programmer you may want to write a computer program. If you a writer you want to define headings, emphasize stuff etc. If you want to manage data you want this data to be formalized so that another human being or another computer can recognize the actual meaning of your data.

While plain text only contains a certain set of human readable characters, extra meaning can be added to the text while still maintaining the principal properties and characteristics of a plain text. This can be a programming language or some kind of defined markup language. A markup language specifies the structure and formatting of a document and can define relationships between its parts and add meaning to it’s characters (semantics). It is often used to enrich its content or to facilitate automated processing including the mode of displaying it’s contents.

As for text or text-related documents a markup language is a set of rules defining how the meaning of content should be transcribed, being the relationship between semantics and syntax. The term “markup” evolved from the “marking up” of paper manuscripts with red ink to give formatting instructions to eventually create production ready materials. When typewriters where widely used a similar technique was widley adopted like when you wanted to emphasize a text you simply added an asterisk (*) at the beginning and the end of a word (like This is *emphasized*!). A markup language takes it to the next level by defining strictly and unambiguously how markup has to be done thus enabling. The more extensive the definitions the more features like defining hyperlinks, links to images files etc. can be added to an otherwise plain text.

Markup

A Markup language defines how to add semantics and features to an otherwise unformatted plain text by simply using ordinary characters.

One of the most popular markup languages is Markdown which exists in a variety of dialects.

Plain Text and Textual Content: Markup#

Plain text allows the focus to be placed on the actual text content. As mentioned before you can add semantics (emphasize, headings) and features (links to images, hyperlinks, …) to a plain text by the use of a markup language.

In order to put the text into a presentation ready form in an appropriate layout, you need

  • an interpreter that translates the plain text to the desired format and

  • style defintions that tell the inerpreter how an item (like a heading or emphasized text) should actually look like (which font, size, shape, color, …).

The advantage of the separation of content, meaning and layout is obvious:

  • The layout can be changed centrally.

  • Style definitions may vary depending on the desired output type or format. For example, one font may be suitable for printing, but a different one for a website.

  • The interpreter can support different output formats, e.g. B. PDF for printing or HTML for use as a web page. The same text can therefore be used for different output formats without change.

digraph {

"Plain Text File"
        [
                shape = note,
                color=crimson,
                style=filled,
                fillcolor=white,
                fontcolor=crimson,
                fontname="Latin Modern Mono",
                label="Plain Text File",
        ];
"Style"
        [
                shape = component,
                color=fuchsia,
                style=filled,
                fillcolor=white,
                fontcolor=fuchsia,
                fontname="Lexend",
                label="Style Definition",
        ];
"Compiler"
        [
                shape = doublecircle,
                color=grey,
                style=filled,
                fillcolor=black,
                fontcolor=white,
                fontname="Lexend",
                label="Interpreter",
        ];



"File.pdf"
        [
                shape = note,
                color=black,
                style=filled,
                fillcolor=white,
                fontcolor=black,
                fontname="Lexend",
                label="PDF File",
        ];

"File.html"
        [
                shape = note,
                color=blue,
                style=filled,
                fillcolor=white,
                fontcolor=blue,
                fontname="Lexend",
                label="Webpage",
        ];
"File.epub"
        [
                shape = note,
                color=darkgreen,
                style=filled,
                fillcolor=white,
                fontcolor=darkgreen,
                fontname="Lexend",
                label="E-Book",
        ];

rankdir = TB;

subgraph {
        "Plain Text File" -> "Compiler" -> "File.pdf";
        "Style" -> "Compiler" [color=fuchsia];
        "Compiler" -> "File.html";
        "Compiler" -> "File.epub";

        {rank = same; "Compiler" ; "Style";}
        }
}

Fig. 83 An interpreter translates the plain text file into different output files and formats#

Plain Text and Data: XML et al.#

The representation of textual content isn’t the only domain of plain text by far. Using special markup languages designed for data storage enables us to store data ind a human readable format. One of the most common formats is XML, the Extensible Markup Language. Its main purpose of XML is storing arbitrary data in a standardized format ind a structured way so that it can be easily processed.

Another well known format is CSV, which stands for Comma Separated Values. With CSV you can export the values of a spreadsheet, the columns separated by commas (or another other deliminator), the rows by lines. It’s simple, quick and supported by almost every software designed to process tabulated data.