Tokenminning in OpenCode – Tokenminning

OpenCode is a terminal-first AI coding agent. Every turn resends your prompt, AGENTS.md, instruction files, MCP tool schemas, and the full session history—including large tool outputs from read, grep, and bash. Most spend comes from long sessions, heavy MCP, and frontier models on routine work—not from one verbose reply.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Run opencode stats (add --models for a breakdown) and note which models dominate.
Open the session sidebar in the TUI and watch context fill—compact or start a new session before you hit the limit.
Use Plan for analysis; switch to Build only when you are ready to edit.
Keep AGENTS.md and instructions short. Disable MCP servers you are not using this week.
Set agent.*.steps on agents that tend to loop. Enable compaction defaults (auto + prune).

Typical impact when you follow the list: 40–70% savings on routine work by routing Plan and Explore to cheaper models; 20–50% on input by trimming AGENTS.md and MCP; 30–60% less context growth from compaction and new sessions per task. Benchmark on your own opencode stats output—your provider mix and session length will differ from anyone else’s.

How OpenCode bills a request

OpenCode talks to LLM providers through your API keys (or OpenCode Zen ). Many teams use OpenRouter for a single key across tools. There is no bundled IDE subscription pool like Cursor’s Auto/Composer allowance—you pay provider rates directly.

Each agent turn sends:

Your prompt and any @ file references
Project rules from AGENTS.md, CLAUDE.md, and instructions in opencode.json
Skill metadata (full skill content loads only when the agent calls the skill tool)
MCP tool schemas for every enabled server
The full session transcript, including prior tool outputs

Unlike Cursor’s cache-read pricing tiers, OpenCode does not discount re-sent context at the provider level (unless your provider offers prefix caching—you configure that on the provider side). Long sessions mean more input tokens on every subsequent turn.

Hidden system agents (compaction, summary, title) also call the model when triggered. Compaction saves tokens on future turns but costs tokens once when it runs.

1. Measure first

Where to look:

opencode stats — token usage and cost statistics across sessions (CLI reference )
opencode stats --models — which models consumed the most
opencode stats --days 7 — last week’s burn
Session sidebar in the TUI — context usage percentage for the active session
Your provider dashboard (Anthropic, OpenAI, OpenCode Zen, etc.) — ground truth for billing

After a heavy week, check whether spend is input-heavy (rules, MCP, long sessions) or output-heavy (frontier models, thinking variants, agent loops). That tells you which section below to prioritize.

2. Match the model to the task

See Models for configuration and Providers for auth. This is OpenCode’s version of Model routing: default cheap, escalate only on failure.

Start here:

Plan agent — analysis, architecture questions, review before editing (Agents )
Explore subagent — read-only codebase search (@explore or let Build delegate)
Mid-tier model on Plan — multi-file understanding without Build’s full tool loop
Build agent + capable model — implementation, refactors, multi-step edits
Thinking / reasoning variants — deep debugging or novel design only

Costs more than you expect:

Same frontier model on Plan and Build for every question
Reasoning models with high reasoningEffort or large budgetTokens (model options )
Subagents (@general, @scout) spawning parallel sessions with their own context
Compaction running repeatedly because the session never ends

Assign cheaper models per agent in opencode.json:


{
  "$schema": "https://opencode.ai/config.json",
  "model": "anthropic/claude-sonnet-4-5",
  "agent": {
    "plan": {
      "model": "anthropic/claude-haiku-4-20250514"
    },
    "build": {
      "model": "anthropic/claude-sonnet-4-5"
    }
  }
}

Use Tab to cycle between Plan and Build. Stay in Plan until you approve the approach—Build’s tool loop multiplies token use fast.

3. Trim what rides along every request

Input bloat in OpenCode usually comes from configuration and MCP—not your prompt text alone.

AGENTS.md and instructions

Project rules live in AGENTS.md (created by /init) and load into every session. OpenCode’s rules docs mirror Cursor’s pattern: they compound.

Keep AGENTS.md short—build commands, architecture one-liners, non-obvious conventions
Do not paste entire style guides; reference linter configs instead
Use instructions in opencode.json for shared team docs, not duplicate content in AGENTS.md
Prefer lazy loading: teach the agent to read referenced files only when needed (rules docs )
Audit global rules at ~/.config/opencode/AGENTS.md the same way

If you migrated from Claude Code, CLAUDE.md is a fallback when AGENTS.md is missing—do not maintain both with the same content.

Skills

Skills load on demand via the skill tool—only metadata rides along until invoked. Good pattern for occasional workflows.

Keep description fields precise so the agent does not load the wrong skill
Use permission.skill to hide experimental skills from agents that should not see them
Disable the skill tool on Plan if you want zero skill metadata in planning sessions

MCP servers

Each enabled MCP server adds tool schemas to agent context—even when no tool is called. OpenCode’s MCP docs warn explicitly: GitHub MCP and similar servers can blow past context limits quickly.

Set "enabled": false on servers you are not using this week
One narrow, task-specific server beats five overlapping ones
Disable MCP globally with "tools": { "my-mcp*": false } and enable per-agent when needed (MCP docs )
Prefer built-in grep / glob over MCP search when the codebase is local

Custom agents

Custom agents in .opencode/agents/ or opencode.json can carry long prompt files and broad tool permissions. Audit them like rules.

Short prompts; reference files by path
Set steps to cap agentic iterations (Agents: max steps )
Use permission.task to prevent orchestrator agents from spawning expensive subagents

`@` mentions

Use @ to fuzzy-search files instead of pasting full contents into chat (TUI docs ). Modern agents also read and grep on demand—you rarely need to attach a large folder for a one-line fix.

Prefer: “fix the null check in auth/login.ts line 42”
Use @filename for a single file, not the whole package tree

See Context hygiene for the general just-in-time retrieval pattern.

Compaction and pruning

OpenCode ships with context management in Config: compaction :


{
  "$schema": "https://opencode.ai/config.json",
  "compaction": {
    "auto": true,
    "prune": true,
    "reserved": 10000
  }
}

auto — summarizes the session when context nears the model limit (default: true). Triggers a hidden compaction agent call—costs tokens once, saves on later turns.
prune — strips old tool outputs (file reads, search results) while keeping recent ones (default: true).
reserved — token buffer left for the compaction operation itself.

You can also run /compact (alias /summarize) manually in the TUI before context fills (TUI commands ).

Compaction is not free—it calls the model. Prefer shorter sessions and new tasks over endless compact-and-continue on the same thread.

For custom or mis-reported context windows, set explicit limits on your provider model:


{
  "provider": {
    "my-provider": {
      "models": {
        "my-model": {
          "limit": {
            "context": 128000,
            "output": 8192
          }
        }
      }
    }
  }
}

Accurate limit.context helps OpenCode trigger compaction before hard provider errors (Providers: custom models ).

Agent step limits

Unbounded Build iterations are OpenCode’s equivalent of marathon agent threads. Set a step limit:


{
  "agent": {
    "build": {
      "steps": 25
    }
  }
}

When the limit is reached, the agent summarizes and stops iterating (Agents ). Combine with permission.bash: "ask" if runaway shell commands are part of the problem (Tools & permissions ).

Use Plan mode first so Build does not explore the codebase with full edit permissions before you agree on the approach.

New session per task

Start a new session when you finish one task and begin another, when context in the sidebar is above ~70%, when compaction has already run once, or when the agent loops on a stuck problem. Use /sessions or the command palette (ctrl+p) to switch (TUI ).

4. Write tighter prompts

OpenCode-specific versions of Prompt hygiene:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Batch related fixes in one message instead of five separate turns. Use /undo to revert bad edits instead of paying for another full revision cycle (usage docs ).

5. Set spending guardrails

OpenCode does not enforce your inference budget. You set the limits.

Run opencode stats after heavy sessions
Set provider-side spending caps in Anthropic/OpenAI/OpenCode Zen dashboards
Use steps per agent to stop runaway loops
Route Plan and Explore to cheaper models; reserve frontier models for Build
Disable unused MCP servers and reasoning variants you do not need

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

High input — AGENTS.md, instructions, or MCP bloat. Shorten rules; set "enabled": false on unused MCP servers.

High output — frontier model on routine tasks, thinking variants, or many revision cycles. Cheaper model on Plan; tighter prompts; /undo instead of re-prompting.

Context overflow errors — session too long before compaction triggered. Run /compact; start a new session; verify limit.context for custom providers.

Compaction runs often — session never ends. New session per task; prune is working but you are adding tokens faster than compaction removes them.

Spike after enabling MCP — tool schemas attach every turn. Disable globally; enable per-agent only when needed.

Subagent surprise spend — @general or Task-delegated subagents open child sessions. Restrict with permission.task; invoke read-only @explore instead of @general for search.

Stats do not match provider bill — opencode stats is local telemetry; provider dashboards are billing truth. Reconcile both after a heavy week.

When OpenCode optimization is not enough

Trimming OpenCode configuration does not fix production agent loops in your own API routes. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.