TAO

universal syntax

TAO is the first truly universal syntax for structured communication at any scale, born of a vision of total intercommunication of all software systems.

Its unparalleled simplicity makes it the perfect foundation for future mutually-compatible data, markup, and source code notations – next steps after JSON, XML, and S-expressions.

Data TAO

Data TAO is a notation for representing data with TAO. Its development is the current focus.

The same piece of data1 is encoded below in XML, JSON, and Data TAO.

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
deleted: 26 % (107/412), 4.5 characters per line
changed: 8 % (34/412), 1.4 characters per line
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []
Hover or tap on the JSON representation to highlight characters that would have to be deleted or changed to arrive at the Data TAO version.

JSON has mostly superseded XML in the domain of data representation, largely thanks to its relative simplicity and terseness.

Data TAO is very close to JSON, albeit it is even simpler and more compact. Rather than adding more elements, it takes away all of JSON’s syntactic noise, while keeping its essential expressive power. This produces many general advantages, such us:

More specific advantages include:

The syntax of Data TAO above was highlighted with the interactive Data TAO highlighter. You can use it to examine more examples and try out your own.

With no syntax highlighting the above comparison looks as follows:

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []

Notations on top of TAO

Data TAO is a domain-specific notation on top of TAO.

Such notations are created by introducing restrictions to the grammar. Each notation should be valid TAO, but not vice-versa. E.g. only a subset of TAO is valid Data TAO.

The grammar of TAO

The basic advantage of TAO lies in the simplicity and genericness of its grammar.

The syntax requires only three arbitrary special symbols and in general can be defined for any encoding that can represent at least these.

The canonical definition here is based on Unicode and purposefully selects three concrete characters.

For brevity the following rules are defined only descriptively:

The remaining rules follow below, repeated in two formal notations, as a diagram, and descriptively. All these definitions are equivalent.

Augmented Backus-Naur Form (ABNF)

operator   = "`" any
annotation = 1*any-except-meta
tree       = "[" tao "]"
tao        = *(tree / operator / annotation)

Backus-Naur Form (BNF)

<operator>   ::= "`" <any>
<annotation> ::= <any-except-meta> | <annotation> <annotation>
<tree>       ::= "[" <tao> "]"
<tao>        ::= "" | <tree> | <operator> | <annotation> | <tao> <tao>

Syntax Diagram

Descriptive

The above formal definitions can be translated as:

The three parts of a tao are captured mnemonically in the TAO acronym which expands to Tree Annotation Operator. These basic constructs can also be described as:

Look at the official website of JSON for a grammar complexity comparison.

Future notations

Domain-specific notations based on TAO which are next in line to be introduced after Data TAO are breifly presented below.

Markup TAO

Markup TAO is a notation for representing text markup with TAO.

It is meant to have a simple mapping to HTML. A demonstration:

html-hn tao-mark-hn

Code TAO

Code TAO is a notation for representing code with TAO, an alternative to S-expressions. For example a hypothetical TAO-based Lisp-like language might use:

def [factorial[x]] [
  if [x `= 0] [1]
  else [x `* factorial[x `- 1]]
]

in place of:

(defun factorial (x)
   (if (zerop x)
       1
       (* x (factorial (- x 1)))))

Vision

Syntax is so ubiquitous in software that it is taken for granted and often overlooked. Optimizing it is, for many reasons, not a very popular topic.

To give a feeling of the ubiquity:

Considering this, even a small inefficiency or unnecessary complexity of a syntax along with accompanying processing and cognitive burdens will translate to massive loss of productivity.

TAO is based on the thesis that a simpler and easier syntax shall bring incalculable returns across the software world. The greater the adoption, the greater and more compounded the benefits.

It is a logical step towards simplicity in software evolution.

Historical precedents

The modern Web began with HTML which was designed as a markup language for static websites. As the Web became more dynamic, HTML’s generalized counterpart, XML, was pragmatically adopted as the universal notation for representing trees of data that were now sent between nodes.

These markup languages were however not well-suited for generic data representation. Another pragmatic step was taken to alleviate that. From JavaScript, the dynamic language of the Web, JSON was extracted as a simpler and more suitable data notation.

Today HTML, JSON, and JavaScript are becoming low-level languages, often machine-generated rather than written directly.

JavaScript was not meant as a low-level language however, so it is not perfect as the assembly language of the Web.

WebAssembly is an emerging attempt to fix that. It adopts a generic human-readable syntax: S-expressions.

Interestingly, before CSS became established as the stylesheet language of the Web, there was an S-expression-based alternative to it called DSSSL.

S-expressions were invented before the Web and serve as the basic syntax for code and data in the oldest syntactically-preserved family of languages, widely used to this day: the LISP family. Many popular languages, including JavaScript, have roots in this family.

S-expressions are a minimal generic syntax, like TAO. Because of that they can be used to encode different kinds of trees: data, markup, source code.

They were however not explicitly designed as a universal syntax and have some characteristics that prevented them from becoming one.

TAO on the other hand is designed from the ground-up to be suited for precisely this purpose.

Future

Simple and human-friendly notations are being built on top of TAO along with tools to enable their use.

These can be initially adopted for configuration, as standard input and output formats of offline tools, similarly to how JSON is used today.

TAO-based counterparts of JSON, HTML, CSS, S-expressions, and other notations can be gradually adopted and become standard.

The goal of this evolution is the achievement of low-level unification of the basic data, markup, style, configuration, and source code notations of the Web.

Such removal of fundamental incompatibilities obviates the need for incalculable number of unnecessary translations, freeing enormous computational and cognitive resources.

From that emerges a new and more efficient Web with TAO as its syntactical backbone, accelerating further development and opening up exciting possibilities.

This vision can only be achieved in small steps over a long time with collaboration of everyone who is willing to make it a reality. All support is welcome.

Distant future

The hope is that TAO could facilitate future interplanetary, interstellar, cosmic-scale communication.

Software

The TAO organization on GitHub contains all of the official repositories.

Reference parsers for the TAO grammar are available in C, C#, Java, Scala, TypeScript, and JavaScript.

Also in development are: libraries to handle Data TAO (JavaScript, C#), a prototype structural TAO editor.

Support TAO

TAO is an independent open and free public service project. Its mission is to simplify and interconnect the software world. If you find it valuable, you can help achieve that in the following ways:


  1. Based on a JSON example from Wikipedia↩︎