Chapter III: tokenminning – Tokenminning

Tokenminning means hitting the required quality bar with as little inference spend as possible — on the way in, during reasoning, and on the way out.

The floor is not zero usage, and it is not whatever produces the shortest reply. It is the smallest spend that still yields correct results for the task. The goal is surgical prompts, not starved ones.

One useful comparison is tuned data access: two queries can return the same rows while one scans the entire table and the other uses an index. The cheaper path is not inferior engineering. Likewise, two prompts that produce equivalent outcomes differ in quality as engineering artifacts when one needs an order of magnitude more tokens.

Habits the Manifesto encourages

Practitioners are asked to treat each token as carrying monetary, latency, energy, and environmental weight.

Intentional use beats accidental growth. Many expensive systems never decided to be expensive; defaults and inertia did the work.
Shortening prompts is a skill. Knowing what the model actually needs means understanding the task, the model family, and typical failure modes.
Judge systems by outcomes per dollar and per second. Saving tokens that also save correctness is not a win. The question is whether marginal tokens change results.

Common misunderstandings

The Manifesto explicitly rejects a few shortcuts:

Stripping required instructions to shave counts degrades output; noise removal is not the same as signal removal.
Choosing the biggest available model for every job ignores routing; sometimes the frontier tier is correct, often it is not.
Fragile micro-edits that save a handful of tokens but destabilize edge cases are false economies.

The target is slack — tokens that change the bill without changing the answer in a meaningful way.

Why giving it a name matters

Large deployments have trimmed prompts for years under billing pressure. Shared language helps those practices spread: priorities enter roadmaps, interviews, dashboards, and design reviews once everyone can point at the same idea.

Previous: Chapter II: tokenmaxxing · Next: Chapter IV: scale and unit economics