Record-Jar Specification

This version: https://record-jar.github.io/record-jar/
Participate: https://github.com/record-jar/record-jar
Issue tracking: GitHub
Editors: Jacob Hummer

Abstract

Record-jar is a data format for storing a list of groups of key-value string pairs. This document describes the format in detail.

Introduction

The original description of the "Record-Jar" file format is in The Art of Unix Programming by Eric Steven Raymond.

Original Record-Jar Format section from The Art of Unix Programming

Record-Jar Format

Cookie-jar record separators combine well with the RFC 822 metaformat for records, yielding a format we'll call β€˜record-jar’. If you need a textual format that will support multiple records with a variable repertoire of explicit fieldnames, one of the least surprising and human-friendliest ways to do it would look like Example 5.4.

Example 5.4. Basic data for three planets in a record-jar format.

Planet: Mercury
Orbital-Radius: 57,910,000 km
Diameter: 4,880 km
Mass: 3.30e23 kg
%%
Planet: Venus
Orbital-Radius: 108,200,000 km
Diameter: 12,103.6 km
Mass: 4.869e24 kg
%%
Planet: Earth
Orbital-Radius: 149,600,000 km
Diameter: 12,756.3 km
Mass: 5.972e24 kg
Moons: Luna

Of course, the record delimiter could be a blank line, but a line consisting of "%%\n" is more explicit and less likely to be introduced by accident during editing (two printable characters are better than one because it can't be generated by a single-character typo). In a format like this it is good practice to simply ignore blank lines.

If your records have an unstructured text part, your record-jar format is closely approaching a mailbox format. In this case, it's important that you have a well-defined way to escape the record delimiter so it can appear in text; otherwise, your record reader is going to choke on an ill-formed text part someday. Some technique analogous to byte-stuffing (described later in this chapter) is indicated.

Record-jar format is appropriate for sets of field-attribute associations that are like DSV files, but have a variable repertoire of fields, and possibly unstructured text associated with them.

β€” The Art of Unix Programming

The description of the data format is not precise enough to be interpreted the same by every reader. Some ambiguities make differing implementations inconsistent with each other. For example, is \n the only valid newline character, or should \r\n be considered the same as \n? What about extra whitespace after the colon in Field Name: Field Value pairs; is that part of the value, or should that be considered extra ignorable whitespace?

There has been an attempt to standardize the record-jar format before. Addison Phillips created The record-jar Format | draft-phillips-record-jar-02 which codifies the grammar, parsing behaviour, conventions, and more of the record-jar format. This document stayed as a draft and never got any traction to become an official standard.

This document attempts to revive the IETF draft by Addison Phillips with an associated suite of SDKs for multiple programming languages for the basic parsing and serialization of record-jar documents.

Examples

The canonical example from The Art of Unix Programming

Planet: Mercury
Orbital-Radius: 57,910,000 km
Diameter: 4,880 km
Mass: 3.30e23 kg
%%
Planet: Venus
Orbital-Radius: 108,200,000 km
Diameter: 12,103.6 km
Mass: 4.869e24 kg
%%
Planet: Earth
Orbital-Radius: 149,600,000 km
Diameter: 12,756.3 km
Mass: 5.972e24 kg
Moons: Luna

Excerpt from the Language Subtag Registry

%%
Type: language
Subtag: ia
Description: Interlingua (International Auxiliary Language \
 Association)
Added: 2005-08-16
%%
Type: language
Subtag: id
Description: Indonesian
Added: 2005-08-16
Suppress-Script: Latn
%%
Type: language
Subtag: nb
Description: Norwegian Bokmål
Added: 2005-08-16
Suppress-Script: Latn
%%

Format and Grammar

TODO