Draft v0.7 · Full JSON data model · CC BY 4.0

TERSE

A token-efficient alternative to JSON, built for LLMs — covers the full JSON data model with 30–55% fewer tokens.

30–55%
fewer tokens vs JSON
100%
JSON data model coverage
5
design principles
4
reference implementations

JSON vs TERSE

Same data, ~44% fewer tokens — no information lost.

config.json ~85 tokens
{
  "name": "my-app",
  "version": "2.1.0",
  "private": true,
  "port": 3000,
  "debug": false,
  "tags": ["web", "typescript", "spa"],
  "author": {
    "name": "Alice",
    "email": "alice@co.com"
  }
}
config.terse ~48 tokens
name: my-app
version: 2.1.0
private: T
port: 3000
debug: F
tags: [web typescript spa]
author: {name:Alice email:alice@co.com}

Schema arrays — the biggest win for tabular data

When all array elements share the same keys, TERSE declares fields once and lists values positionally. Every repeated key name disappears.

users.json ~145 tokens
{ "users": [
  {"id":1,"name":"Ana Lima",
   "role":"admin","active":true},
  {"id":2,"name":"Bruno Melo",
   "role":"editor","active":true},
  {"id":3,"name":"Carla Neves",
   "role":"viewer","active":false},
  {"id":4,"name":"Diego Alves",
   "role":"editor","active":true},
  {"id":5,"name":"Elena Costa",
   "role":"viewer","active":false}
]}
users.terse ~67 tokens
users:
  #[id name role active]
  1 "Ana Lima"    admin  T
  2 "Bruno Melo"  editor T
  3 "Carla Neves" viewer F
  4 "Diego Alves" editor T
  5 "Elena Costa" viewer F

Playground

Paste any JSON — TERSE updates in real time. Hit the button to verify round-trip.

JSON input
TERSE output (editable)

Token savings by document type

Measured with cl100k_base (GPT-3.5 / GPT-4). TOON and CSV are N/A for nested types — they cannot represent those structures.

Format Nested objects Heterogeneous arrays Typed nulls Schema arrays LLM-optimized tokens
JSON
YAML
TOON
CSV
TERSE
Document type TERSE vs JSON YAML vs JSON TOON vs JSON CSV vs JSON
App config (package.json style)
35–45%
20–24%
15–20%
N/A
User list (flat, 5 records)
40–55%
18–22%
40–55%
40–55%
Structured logs (5 entries)
35–45%
18–22%
35–45%
35–45%
Product catalog (nested + array)
30–40%
20–25%
10–15%
N/A
Complex order (deep nesting)
25–35%
20–25%
5–10%
N/A
AI tool call payload (deep)
20–30%
18–22%
5–10%
N/A

How to read this table:

App config & structured configs — TERSE saves 35–45% because configs are typically shallow objects with nested sub-objects. TOON only manages 15–20% here: it handles flat rows well but falls apart as soon as nesting appears.

User lists & logs — All three formats perform similarly (~40–55%). This is the easy case: flat, uniform, tabular data where every format shines. If your data always looks like this, any option works.

Product catalogs, orders, AI payloads — This is where TERSE separates itself. TOON drops to 5–15% because it cannot represent nested objects at all. CSV disappears entirely (N/A). TERSE holds 20–40% savings even in deeply nested structures.

YAML vs TERSE — YAML supports full nesting but pays a verbosity tax: arrays use - item per line instead of [a b c], and booleans cost true/false instead of T/F. YAML also has no schema-array optimization for tabular data. The result: ~20–24% fewer tokens than JSON vs ~47% for TERSE — YAML is a genuine improvement, but TERSE was designed specifically for token efficiency, not human authoring.

The core message: TOON and CSV only win on flat, uniform data — the easy case. YAML wins on human readability but not on token count. TERSE delivers consistent savings across every structure, including the complex cases that dominate real LLM pipelines.

Why TERSE

Five principles that explain every design decision.

01

Bare strings

Identifiers and common values require no quotation marks. production stays production, not "production". Quotes are reserved for strings that actually need them.

02

Compact primitives

null, true, and false become single characters: ~, T, F. Three of the most common values in any payload, each reduced to one token.

03

Implicit delimiters

Spaces separate values inside objects and arrays. No trailing commas, no colons between array elements. Structure is implied by position, not punctuation.

04

Schema arrays

Uniform arrays declare fields once with #[f1 f2 f3], then list values positionally. For tabular data with many rows, this eliminates every repeated key name.

05

Recursive structure

All constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations like TOON or CSV.

Why not compress further?

TERSE is optimized for the intersection of two constraints: token efficiency and human auditability. Further compression techniques — key abbreviation, binary type encoding, dictionary compression — would yield additional token savings but would break the ability to inspect, debug, and audit payloads without tooling. For LLM pipelines in production, auditability is a safety property, not just a convenience.

Implementations

Reference implementations with full spec coverage and 100% round-trip guarantee.

TypeScript / JavaScript

terse-js

npm install terse-js
import { serialize, parse } from "terse-js";

serialize({ a: 1, b: true });
// "{a:1 b:T}"

parse("{name:Alice age:30}");
// { name: "Alice", age: 30 }
Python

terse-py

pip install terse-py
from terse import serialize, parse

serialize({"a": 1, "b": True})
# "{a:1 b:T}"

parse("{name:Alice age:30}")
# {"name": "Alice", "age": 30}
Go

terse-go

// go get github.com/RudsonCarvalho/terse-go
// Zero external dependencies — Go 1.21+
terse.Serialize(map[string]any{"a": 1, "b": true})
// "{a:1 b:T}"

terse.Parse("{name:Alice age:30}")
// map["name":"Alice" "age":30]
Java

terse-java

// Maven (pom.xml) or Gradle (build.gradle)
// Zero runtime dependencies — Java 11+
Terse.serialize(Map.of("a", 1L, "b", true));
// "{a:1 b:T}"

Terse.parse("{name:Alice age:30}");
// Map {"name":"Alice", "age":30L}

Specification

TERSE-Spec-v0.7 · Draft · CC BY 4.0