Chapter II: tokenmaxxing
Tokenmaxxing is the Manifesto’s label for the opposite habit: expanding token use at every layer without a quality justification.
Typical symptoms include bloated system instructions kept long for safety, whole repositories dropped into context, step-by-step reasoning demanded for simple lookups, and agent pipelines that retain full tool payloads instead of keeping a lean working state.
None of this required bad intent. Early wins with longer prompts, bigger windows, and richer examples encouraged a blanket strategy: if some extra tokens helped once, always add more. The Manifesto argues that inference from narrow successes to universal rules fails in production — habits formed in subsidized dev environments then ship unchanged.
Costs that rarely get reviewed together
| Area | How waste shows up |
|---|---|
| Speed | Input length and output volume both stretch wall-clock time; attention cost over long contexts makes the penalty steeper than linear |
| Spend | Per-token billing means a prompt that balloons tenfold multiplies cost on every call — painful at daily volume |
| Power | Each forward pass burns energy that never appears on the vendor invoice |
| Capacity | Oversized requests eat shared accelerator time and slow unrelated workloads |
Free tiers, research credits, and flat dev pricing hide the bill while teams cement patterns. Production is where those patterns become expensive.
Why more text is not always better quality
Advocates of maximal context often equate length with reliability. The Manifesto complicates that story.
Models do not weight every position in a long input equally — middle sections can be under-attended compared with the start and end, so a massive window does not ensure the model will use what you inserted. System instructions that accrete over weeks from many authors can conflict silently, which surfaces as flaky behavior rather than obvious prompt bugs.
Deep multi-step reasoning pays off on hard logic, math, and intricate code. For labeling, extraction, and schema-shaped transforms, verbose reasoning frequently buys little accuracy while increasing the chance of drift across steps. In agent setups, full raw tool dumps usually add clutter; the actionable slice is often small.
Compounding organizational drag
Inefficient prompts rarely get revisited after the demo ships. Requirements change by appending new lines instead of rewriting. Orchestration libraries often retain full histories by default, and overriding that behavior looks optional until traffic scales.
Without usage broken down by feature, user, and template, engineers lack price signals. Product and finance cannot tie spend to value. Waste stacks quietly until reconciliation forces a crisis.
Previous: Chapter I: the production gap · Next: Chapter III: tokenminning