the universal syntax

TAO is the first truly universal syntax for structural communication at any scale, born of a vision of total intercommunication of all software systems.

Its unparalleled simplicity makes it the perfect foundation for future mutually-compatible data, markup, and source code notations – next steps after JSON, XML, and S-expressions.

Vision

The vision behind TAO is simple.

Syntax is something that is used in software all the time, everywhere. Its so ubiquitous that it is taken for granted and often overlooked. Optimizing it is, for many reasons, not a very popular topic.

To give a feeling of the ubiquity:

Considering this, even a small inefficiency or unnecessary complexity of a syntax along with accompanying processing and cognitive burdens will translate to massive loss of productivity.

TAO is based on the thesis that a simpler and easier syntax shall bring incalculable returns across the software world. The greater the adoption, the greater and more compounded the benefits.

It is a logical step towards simplicity in software evolution.

Historical precedents

The modern Web began with HTML which was designed as a markup language for static websites. As the Web became more dynamic, HTML’s generalized counterpart, XML, was pragmatically adopted as the universal notation for representing trees of data that were now sent between nodes.

These markup languages were however not well-suited for generic data representation. Another pragmatic step was taken to alleviate that. From JavaScript, the dynamic language of the Web, JSON was extracted as a simpler and more suitable data notation.

Today HTML, JSON, and JavaScript are becoming low-level languages, often machine-generated rather than written directly.

JavaScript was not meant as a low-level language however, so it is not perfect as the assembly language of the Web.

WebAssembly is an emerging attempt to fix that. It adopts a generic human-readable syntax: S-expressions.

Interestingly, before CSS became established as the stylesheet language of the Web, there was an S-expression-based alternative to it called DSSSL.

S-expressions were invented before the Web and serve as the basic syntax for code and data in the oldest syntactically-preserved family of languages, widely used to this day: the LISP family. Many popular languages, including JavaScript, have roots in this family.

S-expressions are a minimal generic syntax, like TAO. Because of that they can be used to encode different kinds of trees: data, markup, source code.

They were however not explicitly designed as a universal syntax and have some characteristics that prevented them from becoming one.

TAO on the other hand is designed from the ground-up to be suited for precisely this purpose.

Future

Simple and human-friendly notations are being built on top of TAO along with tools to enable their use.

These can be initially adopted for configuration, as standard input and output formats of offline tools, similarly to how JSON is used today.

TAO-based counterparts of JSON, HTML, CSS, S-expressions, and other notations can be gradually adopted and become standard.

The goal of this evolution is the achievement of low-level unification of the basic data, markup, style, configuration, and source code notations of the Web.

Such removal of fundamental incompatibilities obviates the need for incalculable number of unnecessary translations, freeing enormous computational and cognitive resources.

From that emerges a new and more efficient Web with TAO as its syntactical backbone, accelerating further development and opening up exciting possibilities.

This vision can only be achieved in small steps over a long time with collaboration of everyone who is willing to make it a reality. All support is welcome.

TAO-D – a notation for data

The same piece of data1 is encoded below in XML, JSON, and TAO-D.

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
deleted: 26 % (107/412), 4.5 characters per line
changed: 8 % (34/412), 1.4 characters per line
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []
Hover or tap on the JSON representation to highlight characters that would have to be deleted or changed to arrive at the TAO-D version.

JSON has mostly superseded XML in the domain of data representation, largely thanks to its relative simplicity and terseness.

TAO-D is very close to JSON, albeit it is even simpler and more compact. Rather than adding more elements, it takes away all of JSON’s syntactic noise, while keeping its essential expressive power. This produces many general advantages, such us:

More specific advantages include:

The syntax of TAO-D above was highlighted with the interactive TAO-D highlighter. You can use it to examine more examples and try out your own.

With no syntax highlighting the above comparison looks as follows:

<person 
  first-name="John" 
  last-name="Smith"
  is-alive="true"
  age="27"
>
  <address 
    street-address="21 2nd Street" 
    city="New York" 
    state="NY" 
    postal-code="10021-3100" 
  />
  <phone-numbers>
    <phone-number
      type="home"
      number="212 555-1234"
    />
    <phone-number
      type="office"
      number="646 555-4567"
    />
  </phone-numbers>
  <children />
  <spouse xsi:nil="true" />
</person>
{ 
  "first name": "John",
  "last name": "Smith",
  "is alive": true,
  "age": 27,
  "address": {
    "street address": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postal code": "10021-3100"
  },
  "phone numbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null 
}
first name [John]
last name [Smith]
is alive [true]
age [27]
address [
  street address [21 2nd Street]
  city [New York]
  state [NY]
  postal code [10021-3100]
]
phone numbers [
  [
    type [home]
    number [212 555-1234]
  ]
  [
    type [office]
    number [646 555-4567]
  ]
]
children []
spouse []

Tools to convert between JSON and TAO-D are in development.

TAO-M – a markup notation

To be announced.

TAO-C – a code notation

To be announced.

The grammar of TAO

The basic advantage of TAO lies in the simplicity of its grammar.

It is defined below in two formal notations, as a diagram, and descriptively.

All definitions are equivalent.

For brevity the following rules are defined only descriptively:

Augmented Backus-Naur Form (ABNF)

operator   = "`" any
annotation = 1*any-except-meta
tree       = "[" tao "]"
tao        = *(tree / operator / annotation)

Backus-Naur Form (BNF)

<operator>   ::= "`" <any>
<annotation> ::= <any-except-meta> | <annotation> <annotation>
<tree>       ::= "[" <tao> "]"
<tao>        ::= "" | <tree> | <operator> | <annotation> | <tao> <tao>

Syntax Diagram

Descriptive

The above formal definitions can be translated as:

The three parts of a tao are captured mnemonically in the TAO acronym which expands to Tree Annotation Operator. These basic constructs can also be described as:

Look at the official website of JSON for comparison of complexity.

A reference implementation of the grammar is available as an interactive parser in JavaScript.

Software

Official

The following interactive tools are available:

They include several predefined examples.

Support TAO

TAO is an independent open and free public service project. Its mission is to simplify and interconnect the software world. If you find it valuable, you can help achieve that in the following ways:

To suggest a TAO-related project to link here or for other matters related to this website, please create an issue on github.

For information about recent changes, visit this github page.

For other matters, please contact the maintainers.


  1. Based on a JSON example from Wikipedia↩︎