Question 1

What is tokenminning?

Accepted Answer

Tokenminning is the deliberate practice of reducing large language model (LLM) token consumption while preserving useful output quality. Teams treat tokens as capital: they match models to tasks, trim bloated context, set spending caps, and measure AI cost against shipped features rather than raw usage volume.

Question 2

What is the difference between tokenminning and tokenmaxxing?

Accepted Answer

Tokenmaxxing is the habit of maximizing AI usage — often measured by raw token volume on leaderboards or usage quotas — without tying spend to outcomes. Tokenminning is the deliberate counter-move: reducing token consumption while keeping quality, through model routing, context hygiene, budgets, and cost attribution. Tokenmaxxing optimizes for volume; tokenminning optimizes for value per token.

Question 3

Is tokenminning the same as "token minimizing"?

Accepted Answer

Yes. Tokenminning is the coined term for token minimizing — using fewer LLM tokens to achieve the same result. Press coverage often says "token minimizing"; engineering and FinOps teams use tokenminning when they formalize the practice with routing, metering, and enforcement.

Question 4

Who coined tokenminning?

Accepted Answer

The term emerged from the tokenminning movement, whose founding document is The Tokenminning Manifesto on this wiki. Rob May, CEO of Neurometric and author of the manifesto, is widely cited as a leading voice in popularizing both the vocabulary and the engineering practices. The New York Times covered the shift from tokenmaxxing to tokenminning in June 2026 as enterprise employers began imposing limits on coding assistants.

Question 5

Who uses tokenminning?

Accepted Answer

Enterprise engineering, platform, and FinOps teams at companies with large LLM bills — especially after agentic workflows and frontier-model defaults drove costs past budget. In 2026, employers including Meta, Uber, Walmart, and Amazon began limiting coding-assistant usage, removing token leaderboards, and asking teams to justify frontier-model spend. AI platform leads, product owners, and builders who own inference cost also adopt tokenminning practices.

Question 6

How do I start tokenminning?

Accepted Answer

Start by measuring input and output tokens separately per feature and session, then optimize in order: prompt hygiene, model routing, prompt caching, context hygiene, and output control. Set per-session and monthly caps before shipping agents. Use the Tokenminning Constitution as enforcement policy and the practice guides for technique. See Where to start for the full sequence.

Question 7

Is tokenminning about using less AI?

Accepted Answer

No. Tokenminning is about using AI strategically — routing simple work to cheaper models, compressing context that does not change outcomes, and enforcing budgets on agent loops. The goal is not fewer features; it is lower cost per correct result.

Question 8

How much can tokenminning save?

Accepted Answer

Savings depend on baseline waste, but production teams commonly report 60–90% reductions from model selection and routing alone, with additional gains from context hygiene, prompt caching, and output caps. Measurement is prerequisite: without per-feature attribution, optimization is guesswork.

Company	What changed
Meta	Limited employee AI use after an exponential cost increase; removed token leaderboards while still planning billions in annual AI spend
Uber	Exhausted its projected AI budget for the year in four months; imposed monthly limits on coding tools
Walmart	Set caps on different AI products
Amazon	Removed tokenmaxxing leaderboards alongside Meta

Practice	What it avoids
Task-based model routing	Paying frontier rates for simple work
Context summarization and truncation	Context inflation on every agent turn
Cached and reused system prompts	Re-billing identical input on each request
Per-feature cost attribution	Unexplained "AI costs went up" line items
Monthly and per-session caps	Agent loops that run unbounded

Term	Definition
Tokenminning	Deliberately reducing LLM token consumption while preserving useful output; the named discipline and movement.
Token minimizing	Descriptive synonym for tokenminning; common in press and enterprise policy language.
Tokenmaxxing	Maximizing raw token volume—via leaderboards, unlimited credits, or defaulting to the largest context and frontier models without quality justification.
Token	A billing unit for LLM usage, roughly a word fragment; providers charge per input and output token.
Input tokens	Everything sent to the model: prompts, history, retrieved documents, tool schemas. Often dominates agent workloads.
Output tokens	Everything the model generates: replies, JSON, tool calls, reasoning text. Usually priced higher per token than input.
Context inflation	Growth in tokens per request over time even when per-token prices stay flat—driven by longer histories, RAG over-fetch, and agent loops.
Frontier model	The most capable (and typically most expensive) tier from a provider; justified for hard reasoning, not routine tasks.
Model selection / routing	Choosing the right model per task; cascade escalation only when cheaper tiers fail quality checks.
Agentic workflow	Multi-step AI systems that iterate, call tools, and refine—multiplying token use per user action.
Attribution	Tagging inference spend by feature, user, session, or customer so cost maps to product surfaces.
FinOps (AI)	Financial operations for AI infrastructure: metering, forecasting, unit economics, and budget enforcement.

What is tokenminning?

Tokenminning in one sentence

Enterprise adoption

From tokenmaxxing to tokenminning

Why costs spiraled

Frontier models for everything

Agentic workflows

Context inflation

The measurement problem

Core principles

Examples of tokenminning in production

Tokenminning in practice

Glossary

Frequently asked questions

Further reading