Tokenminning in Claude Code
Claude Code is Anthropic’s terminal-native coding agent. Every turn resends your prompt, CLAUDE.md, MCP tool schemas, and the full session transcript—including large bash, grep, and test outputs. Most spend comes from long headless sessions, tool output bloat, and frontier defaults on routine work—not from one verbose reply.
Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.
Quick checklist
- Check Anthropic console usage or your Claude Pro/Max plan limits.
- Use Sonnet for most agent work. Reserve Opus for tasks that actually need it.
- Keep
CLAUDE.mdshort—build commands and non-obvious conventions only. - Disable MCP servers you are not using this week.
- Scope bash commands—avoid dumping full test logs into context.
- For CI/PR automation: one scoped task per invocation; set
--max-turnsor equivalent limits.
Typical impact when you follow the list: 50–80% savings using Sonnet over Opus for routine work; 20–50% on input by trimming CLAUDE.md and MCP; 30–60% less context growth from shorter sessions and truncated tool output. Benchmark on your own Anthropic dashboard.
How Claude Code bills a request
Claude Code bills through Anthropic API keys, Claude Pro/Max subscriptions, or team plans depending on setup. Each agent turn sends:
- Your prompt and
@file references CLAUDE.mdand nestedCLAUDE.mdfiles in ancestor directories- MCP tool schemas for enabled servers
- Full session history, including prior tool inputs and outputs
Anthropic prompt caching can reduce cost on repeated prefixes within a session—stable CLAUDE.md at the start helps. Long sessions still grow cache read volume as history expands.
Headless mode (claude -p) and CI integrations multiply spend when loops lack turn caps—each iteration is a full model call.
1. Measure first
Where to look:
- Anthropic Console → Usage
- Claude Code session summaries when available in terminal output
- CI logs — track token-heavy PR bots separately from interactive sessions
After a heavy week, check whether spend is input-heavy (CLAUDE.md, MCP, bash output) or output-heavy (Opus, long diffs, many turns). That tells you which section below to prioritize.
2. Match the model to the task
See Anthropic models for current rates. This is Claude Code’s version of Model routing: default Sonnet, escalate only on failure.
Start here:
- Haiku — quick file lookups, small edits, scripted checks
- Sonnet — most interactive agent work, multi-file refactors
- Opus — deep debugging, architecture, novel design only
Costs more than you expect:
- Opus as default for every
@-mention - Extended thinking modes on routine tasks
- Headless PR bots without
--max-turns - Parallel subagents or background tasks each carrying full context
Use /model to switch mid-session only when necessary—prefer a new session when changing tiers.
3. Trim what rides along every request
Input bloat in Claude Code usually comes from CLAUDE.md, MCP, and tool output—not your prompt text alone.
CLAUDE.md
CLAUDE.md loads into every session in that directory tree.
- Keep it short — commands, architecture one-liners, repo-specific gotchas
- Use nested
CLAUDE.mdin subpackages instead of one root megafile - Do not duplicate content in
CLAUDE.md,AGENTS.md, and global Claude settings - Reference linter configs instead of pasting style guides
MCP servers
Each enabled MCP server adds tool schemas to context.
- Run
claude mcp listand remove unused servers - One narrow server beats five overlapping ones
- Prefer built-in Read, Grep, and Bash over MCP search for local repos
Bash and test output
Large command output re-enters context on every subsequent turn.
- Pipe through
head,tail, or--max-lineswhen exploring logs - Run focused tests (
pnpm test path/to/file) not full suite unless needed - Ask Claude to summarize long output instead of re-pasting raw logs
Permissions and auto-accept
Auto-accepting edits and bash speeds loops—and spend.
- Use permission prompts for destructive commands in shared repos
- Scope CI invocations with explicit file lists and acceptance criteria
New session per task
Start a new session when you finish one task and begin another, when you switch models, or when the agent loops. For automation, one PR comment → one scoped claude -p run.
See Context hygiene for the general just-in-time retrieval pattern.
4. Write tighter prompts
Claude Code-specific versions of Prompt hygiene:
Too broad:
Fix this bug. Also review the whole auth system and suggest improvements.Scoped:
Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.For CI: include file paths, test command, and definition of done in the first message—vague delegation burns turns exploring.
5. Set spending guardrails
Claude Code does not enforce your inference budget. You set the limits.
- Set Anthropic workspace spend limits
- Cap headless turns in CI (
--max-turns, timeout wrappers) - Use Haiku or Sonnet in automation; Opus only on labeled jobs
- Separate API keys for CI vs interactive dev
For metering and caps in products you ship, see Article I and Article IV.
Troubleshooting
High input — CLAUDE.md, MCP, or huge bash output. Trim docs; truncate commands; disable MCP.
High output — Opus default or verbose diffs. Switch to Sonnet; tighter prompts.
CI bill spike — unbounded PR bot loops. --max-turns; scope files; fail closed.
Spike after enabling MCP — schemas attach every turn. Remove unused servers.
Cache not helping — unstable prefix from editing CLAUDE.md mid-session. Keep rules stable at session start.
When Claude Code optimization is not enough
Trimming Claude Code configuration does not fix production agent loops in your own API routes. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.