A token-efficient alternative to JSON, built for LLMs — covers the full JSON data model with 30–55% fewer tokens.
Same data, ~44% fewer tokens — no information lost.
{ "name": "my-app", "version": "2.1.0", "private": true, "port": 3000, "debug": false, "tags": ["web", "typescript", "spa"], "author": { "name": "Alice", "email": "alice@co.com" } }
name: my-app version: 2.1.0 private: T port: 3000 debug: F tags: [web typescript spa] author: {name:Alice email:alice@co.com}
Schema arrays — the biggest win for tabular data
When all array elements share the same keys, TERSE declares fields once and lists values positionally. Every repeated key name disappears.
{ "users": [ {"id":1,"name":"Ana Lima", "role":"admin","active":true}, {"id":2,"name":"Bruno Melo", "role":"editor","active":true}, {"id":3,"name":"Carla Neves", "role":"viewer","active":false}, {"id":4,"name":"Diego Alves", "role":"editor","active":true}, {"id":5,"name":"Elena Costa", "role":"viewer","active":false} ]}
users: #[id name role active] 1 "Ana Lima" admin T 2 "Bruno Melo" editor T 3 "Carla Neves" viewer F 4 "Diego Alves" editor T 5 "Elena Costa" viewer F
Paste any JSON — TERSE updates in real time. Hit the button to verify round-trip.
Measured with cl100k_base (GPT-3.5 / GPT-4). TOON and CSV are N/A for nested types — they cannot represent those structures.
| Format | Nested objects | Heterogeneous arrays | Typed nulls | Schema arrays | LLM-optimized tokens |
|---|---|---|---|---|---|
| JSON | ✓ | ✓ | ✓ | ✗ | ✗ |
| YAML | ✓ | ✓ | ✓ | ✗ | ✗ |
| TOON | ✗ | ✗ | ✗ | ✗ | ✗ |
| CSV | ✗ | ✗ | ✗ | ✗ | ✗ |
| TERSE | ✓ | ✓ | ✓ | ✓ | ✓ |
| Document type | TERSE vs JSON | YAML vs JSON | TOON vs JSON | CSV vs JSON |
|---|---|---|---|---|
| App config (package.json style) | 35–45% |
20–24% |
15–20% |
N/A |
| User list (flat, 5 records) | 40–55% |
18–22% |
40–55% |
40–55% |
| Structured logs (5 entries) | 35–45% |
18–22% |
35–45% |
35–45% |
| Product catalog (nested + array) | 30–40% |
20–25% |
10–15% |
N/A |
| Complex order (deep nesting) | 25–35% |
20–25% |
5–10% |
N/A |
| AI tool call payload (deep) | 20–30% |
18–22% |
5–10% |
N/A |
How to read this table:
App config & structured configs — TERSE saves 35–45% because configs are typically shallow objects with nested sub-objects. TOON only manages 15–20% here: it handles flat rows well but falls apart as soon as nesting appears.
User lists & logs — All three formats perform similarly (~40–55%). This is the easy case: flat, uniform, tabular data where every format shines. If your data always looks like this, any option works.
Product catalogs, orders, AI payloads — This is where TERSE separates itself. TOON drops to 5–15% because it cannot represent nested objects at all. CSV disappears entirely (N/A). TERSE holds 20–40% savings even in deeply nested structures.
YAML vs TERSE — YAML supports full nesting but pays a verbosity tax: arrays use - item per line instead of [a b c], and booleans cost true/false instead of T/F. YAML also has no schema-array optimization for tabular data. The result: ~20–24% fewer tokens than JSON vs ~47% for TERSE — YAML is a genuine improvement, but TERSE was designed specifically for token efficiency, not human authoring.
The core message: TOON and CSV only win on flat, uniform data — the easy case. YAML wins on human readability but not on token count. TERSE delivers consistent savings across every structure, including the complex cases that dominate real LLM pipelines.
Five principles that explain every design decision.
Identifiers and common values require no quotation marks. production stays production, not "production". Quotes are reserved for strings that actually need them.
null, true, and false become single characters: ~, T, F. Three of the most common values in any payload, each reduced to one token.
Spaces separate values inside objects and arrays. No trailing commas, no colons between array elements. Structure is implied by position, not punctuation.
Uniform arrays declare fields once with #[f1 f2 f3], then list values positionally. For tabular data with many rows, this eliminates every repeated key name.
All constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations like TOON or CSV.
TERSE is optimized for the intersection of two constraints: token efficiency and human auditability. Further compression techniques — key abbreviation, binary type encoding, dictionary compression — would yield additional token savings but would break the ability to inspect, debug, and audit payloads without tooling. For LLM pipelines in production, auditability is a safety property, not just a convenience.
Reference implementations with full spec coverage and 100% round-trip guarantee.
npm install terse-js
import { serialize, parse } from "terse-js"; serialize({ a: 1, b: true }); // "{a:1 b:T}" parse("{name:Alice age:30}"); // { name: "Alice", age: 30 }
pip install terse-py
from terse import serialize, parse serialize({"a": 1, "b": True}) # "{a:1 b:T}" parse("{name:Alice age:30}") # {"name": "Alice", "age": 30}
// go get github.com/RudsonCarvalho/terse-go // Zero external dependencies — Go 1.21+
terse.Serialize(map[string]any{"a": 1, "b": true}) // "{a:1 b:T}" terse.Parse("{name:Alice age:30}") // map["name":"Alice" "age":30]
// Maven (pom.xml) or Gradle (build.gradle) // Zero runtime dependencies — Java 11+
Terse.serialize(Map.of("a", 1L, "b", true)); // "{a:1 b:T}" Terse.parse("{name:Alice age:30}"); // Map {"name":"Alice", "age":30L}