The Manifesto
The Tokenminning Manifesto is the founding document of the tokenminning movement — community-maintained and versioned on GitHub (opens in a new tab). It makes the case for delivering equivalent model outcomes while consuming fewer tokens across system prompts, retrieved context, intermediate reasoning, and multi-step agent runs.
The term entered mainstream usage in 2026.
The Constitution turns those ideas into binding production rules: deployment gates, spending caps, and permanent usage records. The Manifesto answers a different question — why the movement exists, what habits it opposes, and how teams should think before they instrument anything.
The gap between prototype and production
Inference spending tends to creep upward long before anyone notices. API line items swell. Shared GPU pools queue longer. Power contracts strain. Finance discovers that the AI budget assumed at kickoff no longer matches reality.
A recurring belief drives much of that drift: teams assume that feeding models more text reliably improves answers. The Manifesto treats that belief as a costly error rather than a safe default. Its response is deliberate, measured use of inference capacity instead of unchecked growth.
Teams positioned well for the next wave of AI adoption are unlikely to be the ones that maximized context by reflex. They are more likely to be the ones that learned when extra tokens help, when they hurt, and how to route work to the right model tier without padding every request.
Tokenmaxxing: the unnamed default
Tokenmaxxing is the Manifesto's label for the opposite habit: expanding token use at every layer without a quality justification.
Typical symptoms include bloated system instructions kept long for safety, whole repositories dropped into context, step-by-step reasoning demanded for simple lookups, and agent pipelines that retain full tool payloads instead of keeping a lean working state.
None of this required bad intent. Early wins with longer prompts, bigger windows, and richer examples encouraged a blanket strategy: if some extra tokens helped once, always add more. The Manifesto argues that inference from narrow successes to universal rules fails in production — habits formed in subsidized dev environments then ship unchanged.
Costs that rarely get reviewed together
| Area | How waste shows up |
|---|---|
| Speed | Input length and output volume both stretch wall-clock time; attention cost over long contexts makes the penalty steeper than linear |
| Spend | Per-token billing means a prompt that balloons tenfold multiplies cost on every call — painful at daily volume |
| Power | Each forward pass burns energy that never appears on the vendor invoice |
| Capacity | Oversized requests eat shared accelerator time and slow unrelated workloads |
Free tiers, research credits, and flat dev pricing hide the bill while teams cement patterns. Production is where those patterns become expensive.
Why more text is not always better quality
Advocates of maximal context often equate length with reliability. The Manifesto complicates that story.
Models do not weight every position in a long input equally — middle sections can be under-attended compared with the start and end, so a massive window does not ensure the model will use what you inserted. System instructions that accrete over weeks from many authors can conflict silently, which surfaces as flaky behavior rather than obvious prompt bugs.
Deep multi-step reasoning pays off on hard logic, math, and intricate code. For labeling, extraction, and schema-shaped transforms, verbose reasoning frequently buys little accuracy while increasing the chance of drift across steps. In agent setups, full raw tool dumps usually add clutter; the actionable slice is often small.
Compounding organizational drag
Inefficient prompts rarely get revisited after the demo ships. Requirements change by appending new lines instead of rewriting. Orchestration libraries often retain full histories by default, and overriding that behavior looks optional until traffic scales.
Without usage broken down by feature, user, and template, engineers lack price signals. Product and finance cannot tie spend to value. Waste stacks quietly until reconciliation forces a crisis.
Tokenminning: what the movement advocates
Tokenminning means hitting the required quality bar with as little inference spend as possible — on the way in, during reasoning, and on the way out.
The floor is not zero usage, and it is not whatever produces the shortest reply. It is the smallest spend that still yields correct results for the task. The goal is surgical prompts, not starved ones.
One useful comparison is tuned data access: two queries can return the same rows while one scans the entire table and the other uses an index. The cheaper path is not inferior engineering. Likewise, two prompts that produce equivalent outcomes differ in quality as engineering artifacts when one needs an order of magnitude more tokens.
Habits the Manifesto encourages
Practitioners are asked to treat each token as carrying monetary, latency, energy, and environmental weight.
- Intentional use beats accidental growth. Many expensive systems never decided to be expensive; defaults and inertia did the work.
- Shortening prompts is a skill. Knowing what the model actually needs means understanding the task, the model family, and typical failure modes.
- Judge systems by outcomes per dollar and per second. Saving tokens that also save correctness is not a win. The question is whether marginal tokens change results.
Common misunderstandings
The Manifesto explicitly rejects a few shortcuts:
- Stripping required instructions to shave counts degrades output; noise removal is not the same as signal removal.
- Choosing the biggest available model for every job ignores routing; sometimes the frontier tier is correct, often it is not.
- Fragile micro-edits that save a handful of tokens but destabilize edge cases are false economies.
The target is slack — tokens that change the bill without changing the answer in a meaningful way.
Why giving it a name matters
Large deployments have trimmed prompts for years under billing pressure. Shared language helps those practices spread: priorities enter roadmaps, interviews, dashboards, and design reviews once everyone can point at the same idea.
Scale, energy, and the finance picture
Individual choices add up against physical and economic limits.
Facility expansion moves slowly; demand spikes quickly. Supply-side investment remains necessary, but consuming less per task frees capacity for other work. Inside one company that is margin; across the industry it eases pressure on grids and queues.
Lower token volume maps to lower compute and lower associated emissions in ways finance and ESG teams can audit — real reduction rather than purchased offsets alone.
For many production LLM programs, recurring inference — not one-off training — now dominates AI opex and grows with every feature and seat. Leaders often cannot state per-feature token use per session, which makes optimization a guess. Measurement is prerequisite, not garnish.
When two rivals ship similar user-visible quality but one spends materially less per interaction, unit economics diverge. As model capability converges on routine tasks, efficiency can matter as much as headline benchmark scores.
Practical themes (in brief)
The Manifesto walks through tactics that mirror themes in the Constitution. Below is a high-level map, not a reproduction of its technique sections.
Instructions and templates
Handle prompts like performance-sensitive code: measure baseline cost and quality before editing, strip notes meant for humans before launch, prefer machine-readable output shapes over long prose formatting rules, drop ceremonial phrasing, review system instructions on a schedule for duplication and conflict, and keep few-shot samples only where they fix a known miss.
Models, routing, and retrieval
Default to the smallest model that passes quality checks, escalate when data proves you must. A classifier that sends easy work to cheap tiers and hard work to capable tiers often cuts average cost sharply. Stable prompt prefixes improve cache reuse. In retrieval pipelines, returning excess chunks is another form of waste. Cap output length when the answer shape is bounded.
Agents and loops
Unchecked history growth can make multi-step runs far more expensive than the sum of their steps. Summarize or externalize state between phases, keep only what the current step needs in the active window, shape tool responses for density, enforce run-level spending limits in orchestration code rather than in model pleading, and parallelize independent tool work where safe.
Measurement and policy
Log tokens in and out with template and product labels, track latency and task success, watch tail spend on long agent runs, block deploys that regress efficiency without review, and combine warning thresholds with hard stops.
Who the Manifesto is written for
The document addresses several audiences with different levers:
- Executives — ask for outcome-level AI unit cost by surface area and whether it is improving
- Finance — require attribution, forecasts, and emissions narratives grounded in usage cuts
- AI platform leads — elevate efficiency to the same tier as uptime and accuracy
- Product owners — treat inference as a launch constraint, not a hidden backend detail
- Builders — own compression, retrieval tuning, agent memory layout, and call-graph profiling
- Sustainability owners — partner on metrics that regulators and investors can scrutinize
Closing commitments (paraphrased)
The Manifesto ends with a short set of pledges for practitioners: establish baselines before tuning; account for the full cost of a token; right-size models by task; edit prompts without deleting what the model needs; keep agent state bounded; track spend beside quality and speed; and stop treating length as a proxy for rigor.
It argues that cheap-seeming inference was partly an accounting illusion — costs were real but deferred. The invitation is to engineer inference deliberately rather than treat verbosity as free quality.
How this page relates to the Constitution
| Document | Role on this wiki |
|---|---|
| Manifesto | Founding document — philosophy, economics, and cultural framing |
| Constitution | Enforceable rules for production systems |
Use this overview for orientation. Use the Constitution when you need policies that survive review, automation, and scale.