Tokenminning in Claude Code – Tokenminning

Claude Code is Anthropic’s terminal-native coding agent. Every turn resends your prompt, CLAUDE.md, MCP tool schemas, and the full session transcript—including large bash, grep, and test outputs. Most spend comes from long headless sessions, tool output bloat, and frontier defaults on routine work—not from one verbose reply.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Check Anthropic console usage or your Claude Pro/Max plan limits.
Use Sonnet for most agent work. Reserve Opus for tasks that actually need it.
Keep CLAUDE.md short—build commands and non-obvious conventions only.
Disable MCP servers you are not using this week.
Scope bash commands—avoid dumping full test logs into context.
For CI/PR automation: one scoped task per invocation; set --max-turns or equivalent limits.

Typical impact when you follow the list: 50–80% savings using Sonnet over Opus for routine work; 20–50% on input by trimming CLAUDE.md and MCP; 30–60% less context growth from shorter sessions and truncated tool output. Benchmark on your own Anthropic dashboard.

How Claude Code bills a request

Claude Code bills through Anthropic API keys, Claude Pro/Max subscriptions, or team plans depending on setup. Each agent turn sends:

Your prompt and @ file references
CLAUDE.md and nested CLAUDE.md files in ancestor directories
MCP tool schemas for enabled servers
Full session history, including prior tool inputs and outputs

Anthropic prompt caching can reduce cost on repeated prefixes within a session—stable CLAUDE.md at the start helps. Long sessions still grow cache read volume as history expands.

Headless mode (claude -p) and CI integrations multiply spend when loops lack turn caps—each iteration is a full model call.

1. Measure first

Where to look:

Anthropic Console → Usage
Claude Code session summaries when available in terminal output
CI logs — track token-heavy PR bots separately from interactive sessions

After a heavy week, check whether spend is input-heavy (CLAUDE.md, MCP, bash output) or output-heavy (Opus, long diffs, many turns). That tells you which section below to prioritize.

2. Match the model to the task

See Anthropic models for current rates. This is Claude Code’s version of Model routing: default Sonnet, escalate only on failure.

Start here:

Haiku — quick file lookups, small edits, scripted checks
Sonnet — most interactive agent work, multi-file refactors
Opus — deep debugging, architecture, novel design only

Costs more than you expect:

Opus as default for every @-mention
Extended thinking modes on routine tasks
Headless PR bots without --max-turns
Parallel subagents or background tasks each carrying full context

Use /model to switch mid-session only when necessary—prefer a new session when changing tiers.

3. Trim what rides along every request

Input bloat in Claude Code usually comes from CLAUDE.md, MCP, and tool output—not your prompt text alone.

CLAUDE.md

CLAUDE.md loads into every session in that directory tree.

Keep it short — commands, architecture one-liners, repo-specific gotchas
Use nested CLAUDE.md in subpackages instead of one root megafile
Do not duplicate content in CLAUDE.md, AGENTS.md, and global Claude settings
Reference linter configs instead of pasting style guides

MCP servers

Each enabled MCP server adds tool schemas to context.

Run claude mcp list and remove unused servers
One narrow server beats five overlapping ones
Prefer built-in Read, Grep, and Bash over MCP search for local repos

Bash and test output

Large command output re-enters context on every subsequent turn.

Pipe through head, tail, or --max-lines when exploring logs
Run focused tests (pnpm test path/to/file) not full suite unless needed
Ask Claude to summarize long output instead of re-pasting raw logs

Permissions and auto-accept

Auto-accepting edits and bash speeds loops—and spend.

Use permission prompts for destructive commands in shared repos
Scope CI invocations with explicit file lists and acceptance criteria

New session per task

Start a new session when you finish one task and begin another, when you switch models, or when the agent loops. For automation, one PR comment → one scoped claude -p run.

See Context hygiene for the general just-in-time retrieval pattern.

4. Write tighter prompts

Claude Code-specific versions of Prompt hygiene:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

For CI: include file paths, test command, and definition of done in the first message—vague delegation burns turns exploring.

5. Set spending guardrails

Claude Code does not enforce your inference budget. You set the limits.

Set Anthropic workspace spend limits
Cap headless turns in CI (--max-turns, timeout wrappers)
Use Haiku or Sonnet in automation; Opus only on labeled jobs
Separate API keys for CI vs interactive dev

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

High input — CLAUDE.md, MCP, or huge bash output. Trim docs; truncate commands; disable MCP.

High output — Opus default or verbose diffs. Switch to Sonnet; tighter prompts.

CI bill spike — unbounded PR bot loops. --max-turns; scope files; fail closed.

Spike after enabling MCP — schemas attach every turn. Remove unused servers.

Cache not helping — unstable prefix from editing CLAUDE.md mid-session. Keep rules stable at session start.

When Claude Code optimization is not enough

Trimming Claude Code configuration does not fix production agent loops in your own API routes. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.