TERSE

JSON vs TERSE

Same data, ~44% fewer tokens — no information lost.

config.json ~85 tokens

{
  "name": "my-app",
  "version": "2.1.0",
  "private": true,
  "port": 3000,
  "debug": false,
  "tags": ["web", "typescript", "spa"],
  "author": {
    "name": "Alice",
    "email": "alice@co.com"
  }
}

config.terse ~48 tokens

name: my-app
version: 2.1.0
private: T
port: 3000
debug: F
tags: [web typescript spa]
author: {name:Alice email:alice@co.com}

Schema arrays — the biggest win for tabular data

When all array elements share the same keys, TERSE declares fields once and lists values positionally. Every repeated key name disappears.

users.json ~145 tokens

{ "users": [
  {"id":1,"name":"Ana Lima",
   "role":"admin","active":true},
  {"id":2,"name":"Bruno Melo",
   "role":"editor","active":true},
  {"id":3,"name":"Carla Neves",
   "role":"viewer","active":false},
  {"id":4,"name":"Diego Alves",
   "role":"editor","active":true},
  {"id":5,"name":"Elena Costa",
   "role":"viewer","active":false}
]}

users.terse ~67 tokens

users:
  #[id name role active]
  1 "Ana Lima"    admin  T
  2 "Bruno Melo"  editor T
  3 "Carla Neves" viewer F
  4 "Diego Alves" editor T
  5 "Elena Costa" viewer F

Token savings by document type

Measured with cl100k_base (GPT-3.5 / GPT-4). TOON and CSV are N/A for nested types — they cannot represent those structures.

Format	Nested objects	Heterogeneous arrays	Typed nulls	Schema arrays	LLM-optimized tokens
JSON	✓	✓	✓	✗	✗
YAML	✓	✓	✓	✗	✗
TOON	✗	✗	✗	✗	✗
CSV	✗	✗	✗	✗	✗
TERSE	✓	✓	✓	✓	✓

Document type	TERSE vs JSON	YAML vs JSON	TOON vs JSON	CSV vs JSON
App config (package.json style)	35–45%	20–24%	15–20%	N/A
User list (flat, 5 records)	40–55%	18–22%	40–55%	40–55%
Structured logs (5 entries)	35–45%	18–22%	35–45%	35–45%
Product catalog (nested + array)	30–40%	20–25%	10–15%	N/A
Complex order (deep nesting)	25–35%	20–25%	5–10%	N/A
AI tool call payload (deep)	20–30%	18–22%	5–10%	N/A

How to read this table:

App config & structured configs — TERSE saves 35–45% because configs are typically shallow objects with nested sub-objects. TOON only manages 15–20% here: it handles flat rows well but falls apart as soon as nesting appears.

User lists & logs — All three formats perform similarly (~40–55%). This is the easy case: flat, uniform, tabular data where every format shines. If your data always looks like this, any option works.

Product catalogs, orders, AI payloads — This is where TERSE separates itself. TOON drops to 5–15% because it cannot represent nested objects at all. CSV disappears entirely (N/A). TERSE holds 20–40% savings even in deeply nested structures.

YAML vs TERSE — YAML supports full nesting but pays a verbosity tax: arrays use - item per line instead of [a b c], and booleans cost true/false instead of T/F. YAML also has no schema-array optimization for tabular data. The result: ~20–24% fewer tokens than JSON vs ~47% for TERSE — YAML is a genuine improvement, but TERSE was designed specifically for token efficiency, not human authoring.

The core message: TOON and CSV only win on flat, uniform data — the easy case. YAML wins on human readability but not on token count. TERSE delivers consistent savings across every structure, including the complex cases that dominate real LLM pipelines.

Why TERSE

Five principles that explain every design decision.

Bare strings

Identifiers and common values require no quotation marks. production stays production, not "production". Quotes are reserved for strings that actually need them.

Compact primitives

null, true, and false become single characters: ~, T, F. Three of the most common values in any payload, each reduced to one token.

Implicit delimiters

Spaces separate values inside objects and arrays. No trailing commas, no colons between array elements. Structure is implied by position, not punctuation.

Schema arrays

Uniform arrays declare fields once with #[f1 f2 f3], then list values positionally. For tabular data with many rows, this eliminates every repeated key name.

Recursive structure

All constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations like TOON or CSV.

Why not compress further?

TERSE is optimized for the intersection of two constraints: token efficiency and human auditability. Further compression techniques — key abbreviation, binary type encoding, dictionary compression — would yield additional token savings but would break the ability to inspect, debug, and audit payloads without tooling. For LLM pipelines in production, auditability is a safety property, not just a convenience.

Implementations

Reference implementations with full spec coverage and 100% round-trip guarantee.

TypeScript / JavaScript

terse-js

npm install terse-js

import { serialize, parse } from "terse-js";

serialize({ a: 1, b: true });
// "{a:1 b:T}"

parse("{name:Alice age:30}");
// { name: "Alice", age: 30 }

GitHub

Python

terse-py

pip install terse-py

from terse import serialize, parse

serialize({"a": 1, "b": True})
# "{a:1 b:T}"

parse("{name:Alice age:30}")
# {"name": "Alice", "age": 30}

GitHub

terse-go

// go get github.com/RudsonCarvalho/terse-go
// Zero external dependencies — Go 1.21+

terse.Serialize(map[string]any{"a": 1, "b": true})
// "{a:1 b:T}"

terse.Parse("{name:Alice age:30}")
// map["name":"Alice" "age":30]

GitHub

Java

terse-java

// Maven (pom.xml) or Gradle (build.gradle)
// Zero runtime dependencies — Java 11+

Terse.serialize(Map.of("a", 1L, "b", true));
// "{a:1 b:T}"

Terse.parse("{name:Alice age:30}");
// Map {"name":"Alice", "age":30L}

GitHub

JSON vs TERSE

Playground

Token savings by document type

Why TERSE

Bare strings

Compact primitives

Implicit delimiters

Schema arrays

Recursive structure

Why not compress further?

Implementations

terse-js

terse-py

terse-go

terse-java

Specification