Skip to Content
ManifestoChapter II: tokenmaxxing

Chapter II: tokenmaxxing

Tokenmaxxing is the Manifesto’s label for the opposite habit: expanding token use at every layer without a quality justification.

Typical symptoms include bloated system instructions kept long for safety, whole repositories dropped into context, step-by-step reasoning demanded for simple lookups, and agent pipelines that retain full tool payloads instead of keeping a lean working state.

None of this required bad intent. Early wins with longer prompts, bigger windows, and richer examples encouraged a blanket strategy: if some extra tokens helped once, always add more. The Manifesto argues that inference from narrow successes to universal rules fails in production — habits formed in subsidized dev environments then ship unchanged.

Costs that rarely get reviewed together

AreaHow waste shows up
SpeedInput length and output volume both stretch wall-clock time; attention cost over long contexts makes the penalty steeper than linear
SpendPer-token billing means a prompt that balloons tenfold multiplies cost on every call — painful at daily volume
PowerEach forward pass burns energy that never appears on the vendor invoice
CapacityOversized requests eat shared accelerator time and slow unrelated workloads

Free tiers, research credits, and flat dev pricing hide the bill while teams cement patterns. Production is where those patterns become expensive.

Why more text is not always better quality

Advocates of maximal context often equate length with reliability. The Manifesto complicates that story.

Models do not weight every position in a long input equally — middle sections can be under-attended compared with the start and end, so a massive window does not ensure the model will use what you inserted. System instructions that accrete over weeks from many authors can conflict silently, which surfaces as flaky behavior rather than obvious prompt bugs.

Deep multi-step reasoning pays off on hard logic, math, and intricate code. For labeling, extraction, and schema-shaped transforms, verbose reasoning frequently buys little accuracy while increasing the chance of drift across steps. In agent setups, full raw tool dumps usually add clutter; the actionable slice is often small.

Compounding organizational drag

Inefficient prompts rarely get revisited after the demo ships. Requirements change by appending new lines instead of rewriting. Orchestration libraries often retain full histories by default, and overriding that behavior looks optional until traffic scales.

Without usage broken down by feature, user, and template, engineers lack price signals. Product and finance cannot tie spend to value. Waste stacks quietly until reconciliation forces a crisis.


Previous: Chapter I: the production gap · Next: Chapter III: tokenminning

Last updated on