This version: https://record-jar.github.io/record-jar/
Participate: https://github.com/record-jar/record-jar
Issue tracking: GitHub
Editors: Jacob Hummer
Record-jar is a data format for storing a list of groups of key-value string pairs. This document describes the format in detail.
The original description of the "Record-Jar" file format is in The Art of Unix Programming by Eric Steven Raymond.
Record-Jar Format
Cookie-jar record separators combine well with the RFC 822 metaformat for records, yielding a format we'll call βrecord-jarβ. If you need a textual format that will support multiple records with a variable repertoire of explicit fieldnames, one of the least surprising and human-friendliest ways to do it would look like Example 5.4.
Example 5.4. Basic data for three planets in a record-jar format.
Planet: Mercury Orbital-Radius: 57,910,000 km Diameter: 4,880 km Mass: 3.30e23 kg %% Planet: Venus Orbital-Radius: 108,200,000 km Diameter: 12,103.6 km Mass: 4.869e24 kg %% Planet: Earth Orbital-Radius: 149,600,000 km Diameter: 12,756.3 km Mass: 5.972e24 kg Moons: Luna
Of course, the record delimiter could be a blank line, but a line consisting of "%%\n" is more explicit and less likely to be introduced by accident during editing (two printable characters are better than one because it can't be generated by a single-character typo). In a format like this it is good practice to simply ignore blank lines.
If your records have an unstructured text part, your record-jar format is closely approaching a mailbox format. In this case, it's important that you have a well-defined way to escape the record delimiter so it can appear in text; otherwise, your record reader is going to choke on an ill-formed text part someday. Some technique analogous to byte-stuffing (described later in this chapter) is indicated.
Record-jar format is appropriate for sets of field-attribute associations that are like DSV files, but have a variable repertoire of fields, and possibly unstructured text associated with them.
The description of the data format is not precise enough to be interpreted the same by every reader. Some ambiguities make differing implementations inconsistent with each other. For example, is \n
the only valid newline character, or should \r\n
be considered the same as \n
? What about extra whitespace after the colon in Field Name: Field Value
pairs; is that part of the value, or should that be considered extra ignorable whitespace?
There has been an attempt to standardize the record-jar format before. Addison Phillips created The record-jar Format | draft-phillips-record-jar-02 which codifies the grammar, parsing behaviour, conventions, and more of the record-jar format. This document stayed as a draft and never got any traction to become an official standard.
This document attempts to revive the IETF draft by Addison Phillips with an associated suite of SDKs for multiple programming languages for the basic parsing and serialization of record-jar documents.
The canonical example from The Art of Unix Programming
Planet: Mercury
Orbital-Radius: 57,910,000 km
Diameter: 4,880 km
Mass: 3.30e23 kg
%%
Planet: Venus
Orbital-Radius: 108,200,000 km
Diameter: 12,103.6 km
Mass: 4.869e24 kg
%%
Planet: Earth
Orbital-Radius: 149,600,000 km
Diameter: 12,756.3 km
Mass: 5.972e24 kg
Moons: Luna
Excerpt from the Language Subtag Registry
%%
Type: language
Subtag: ia
Description: Interlingua (International Auxiliary Language \
Association)
Added: 2005-08-16
%%
Type: language
Subtag: id
Description: Indonesian
Added: 2005-08-16
Suppress-Script: Latn
%%
Type: language
Subtag: nb
Description: Norwegian Bokmål
Added: 2005-08-16
Suppress-Script: Latn
%%
TODO