universal syntax

structural comms
at cosmic scale

TAO is the first truly universal syntax for structural communication at any scale, born of a vision of total intercommunication of all software systems.

Its unparalleled simplicity makes it the perfect foundation for future mutually-compatible data, markup, and source code notations – next steps after JSON, XML, and S-expressions.

Vision

Syntax is so ubiquitous in software that it is taken for granted and often overlooked. Optimizing it is, for many reasons, not a very popular topic.

To give a feeling of the ubiquity:

Considering this, even a small inefficiency or unnecessary complexity of a syntax along with accompanying processing and cognitive burdens will translate to massive loss of productivity.

TAO is based on the thesis that a simpler and easier syntax shall bring incalculable returns across the software world. The greater the adoption, the greater and more compounded the benefits.

It is a logical step towards simplicity in software evolution.

Historical precedents

The modern Web began with HTML which was designed as a markup language for static websites. As the Web became more dynamic, HTML’s generalized counterpart, XML, was pragmatically adopted as the universal notation for representing trees of data that were now sent between nodes.

These markup languages were however not well-suited for generic data representation. Another pragmatic step was taken to alleviate that. From JavaScript, the dynamic language of the Web, JSON was extracted as a simpler and more suitable data notation.

Today HTML, JSON, and JavaScript are becoming low-level languages, often machine-generated rather than written directly.

JavaScript was not meant as a low-level language however, so it is not perfect as the assembly language of the Web.

WebAssembly is an emerging attempt to fix that. It adopts a generic human-readable syntax: S-expressions.

Interestingly, before CSS became established as the stylesheet language of the Web, there was an S-expression-based alternative to it called DSSSL.

S-expressions were invented before the Web and serve as the basic syntax for code and data in the oldest syntactically-preserved family of languages, widely used to this day: the LISP family. Many popular languages, including JavaScript, have roots in this family.

S-expressions are a minimal generic syntax, like TAO. Because of that they can be used to encode different kinds of trees: data, markup, source code.

They were however not explicitly designed as a universal syntax and have some characteristics that prevented them from becoming one.

TAO on the other hand is designed from the ground-up to be suited for precisely this purpose.

Future

Simple and human-friendly notations are being built on top of TAO along with tools to enable their use.

These can be initially adopted for configuration, as standard input and output formats of offline tools, similarly to how JSON is used today.

TAO-based counterparts of JSON, HTML, CSS, S-expressions, and other notations can be gradually adopted and become standard.

The goal of this evolution is the achievement of low-level unification of the basic data, markup, style, configuration, and source code notations of the Web.

Such removal of fundamental incompatibilities obviates the need for incalculable number of unnecessary translations, freeing enormous computational and cognitive resources.

From that emerges a new and more efficient Web with TAO as its syntactical backbone, accelerating further development and opening up exciting possibilities.

This vision can only be achieved in small steps over a long time with collaboration of everyone who is willing to make it a reality. All support is welcome.

Distant future

To give someting to imagine and illustrate the scale that this project could achieve: the hope is that eventually TAO-based or TAO-derived formats will turn out to be suitable for interplanetary, interstellar, cosmic-scale communication.

Data TAO

Data TAO is a notation for representing data with TAO.

The same piece of data1 is encoded below in XML, JSON, and Data TAO.

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
deleted: 26 % (107/412), 4.5 characters per line
changed: 8 % (34/412), 1.4 characters per line
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []
Hover or tap on the JSON representation to highlight characters that would have to be deleted or changed to arrive at the Data TAO version.

JSON has mostly superseded XML in the domain of data representation, largely thanks to its relative simplicity and terseness.

Data TAO is very close to JSON, albeit it is even simpler and more compact. Rather than adding more elements, it takes away all of JSON’s syntactic noise, while keeping its essential expressive power. This produces many general advantages, such us:

More specific advantages include:

The syntax of Data TAO above was highlighted with the interactive Data TAO highlighter. You can use it to examine more examples and try out your own.

With no syntax highlighting the above comparison looks as follows:

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []

Markup TAO

Markup TAO is a notation for representing text markup with TAO.

It is meant to have a simple mapping to HTML. A demonstration:

html-hn tao-mark-hn

Further details to be announced.

Code TAO

Code TAO is a notation for representing code with TAO.

It is comparable to S-expressions.

Details to be announced.

The grammar of TAO

The basic advantage of TAO lies in the simplicity of its grammar.

It is defined below in two formal notations, as a diagram, and descriptively.

All definitions are equivalent.

For brevity the following rules are defined only descriptively:

Augmented Backus-Naur Form (ABNF)

operator   = "`" any
annotation = 1*any-except-meta
tree       = "[" tao "]"
tao        = *(tree / operator / annotation)

Backus-Naur Form (BNF)

<operator>   ::= "`" <any>
<annotation> ::= <any-except-meta> | <annotation> <annotation>
<tree>       ::= "[" <tao> "]"
<tao>        ::= "" | <tree> | <operator> | <annotation> | <tao> <tao>

Syntax Diagram

Descriptive

The above formal definitions can be translated as:

The three parts of a tao are captured mnemonically in the TAO acronym which expands to Tree Annotation Operator. These basic constructs can also be described as:

Look at the official website of JSON for comparison of complexity.

Software

Official

Reference parsers for the TAO grammar are available in C, C#, Java, Scala, TypeScript, and JavaScript.

Also in development are: libraries to handle Data TAO (JavaScript, C#), a prototype structural TAO editor.

The Tree Annotation Organization on GitHub contains all of the official repositories.

Support TAO

TAO is an independent open and free public service project. Its mission is to simplify and interconnect the software world. If you find it valuable, you can help achieve that in the following ways:

To suggest a TAO-related project to link here or for other matters related to this website, please create an issue on github.

For information about recent changes, visit this github page.

For other matters, please contact the maintainers.


  1. Based on a JSON example from Wikipedia↩︎