Tokenminning with OpenRouter
OpenRouter is not an IDE—it is a unified LLM gateway. Cline, Aider, OpenCode, and custom agents send every request through your OpenRouter API key. Waste shows up as frontier defaults, missing fallback chains, duplicate keys across tools, and routing that ignores cached prefixes.
Use this guide alongside your editor guide (Cline, Aider, OpenCode, etc.). The optimization order matches Where to start: measure → route models → trim what each request carries → tighten prompts → set caps.
Quick checklist
- Open OpenRouter Activity and export or note spend by model and API key.
- Put cheap models first in
modelsfallback arrays; reserve frontier IDs for escalation only. - Use separate API keys per tool or environment with individual spending limits.
- Prefer explicit model IDs over
openrouter/autountil you understand your traffic mix. - Log
usagefrom responses—OpenRouter returns token and cost breakdown on every call. - Enable provider
sort: "price"(default) unless latency truly matters.
Typical impact when you follow the list: 50–90% savings routing routine agent steps to haiku-tier models; 10–30% from cache-aware stable prefixes; runaway prevention from per-key caps. Benchmark on Activity exports—your tool mix differs from anyone else’s.
How OpenRouter bills a request
OpenRouter charges provider token cost plus a small platform fee. You prepay credits or pay on invoice depending on account type.
Each request specifies:
model— primary model slug (e.g.anthropic/claude-sonnet-4)- Optional
modelsarray — ordered fallbacks if the primary errors (rate limit, downtime, context length) - Optional
providerobject — sort by price, throughput, or latency; allow or deny specific providers
OpenRouter returns usage on every response (including streaming):
- Prompt and completion token counts
cached_tokensandcache_write_tokenswhen the upstream provider supports cachingcostin credits for the model that actually served the request
Failed fallback attempts are not billed. You pay for the model that succeeds.
Organization accounts share a credit pool—Activity shows all members’ metadata (model, cost, timing) but not prompts.
1. Measure first
Where to look:
- OpenRouter Activity — filter by API key, model, time range
- Activity CSV/PDF export — group by Model or API Key for weekly review
- Each tool’s local stats (
opencode stats, Cline task hints,/tokensin Aider) — reconcile against OpenRouter as billing truth GET /generation?id=...— fetch usage for a specific completion ID when debugging one expensive call
After a heavy week, check which API key and model slug dominated. Split keys if one agent tool consumed most credits.
2. Match the model to the task
This is OpenRouter’s version of Model routing: explicit cheap defaults, frontier only on failure or escalation.
Start here:
- Haiku / flash / mini tier — classification, log grep, single-file edits
- Sonnet / GPT-4.1 class — most multi-step agent work
- Opus / o3 / thinking — deep debugging, architecture—only when cheaper models fail
Configure fallbacks — cheaper model first, capable model second:
{
"model": "anthropic/claude-haiku-4-20250514",
"models": ["anthropic/claude-haiku-4-20250514", "anthropic/claude-sonnet-4-5"]
}In OpenAI-compatible clients, pass models via the vendor’s extension field (e.g. extra_body in Python).
Costs more than you expect:
openrouter/autoor:nitrovariants on every requestsort: "throughput"when price sorting would suffice- Single frontier slug with no fallback—errors retry manually and burn wall-clock
- Same key for CI bots and interactive dev without caps
Use OpenRouter model search for current pricing—rates change; link docs instead of hardcoding in your repo.
3. Trim what rides along every request
OpenRouter forwards whatever your client sends—it does not trim agent bloat. Reduce tokens upstream in each tool.
Per-key isolation
- Dev key — interactive Cline/Aider with moderate cap
- CI key — low cap, haiku-only model list
- Prod key — application traffic separate from coding agents
Rotate keys when a tool leaks into the wrong environment.
Stable prefixes for caching
Providers discount cached input when the prompt prefix is stable.
- Put static system prompts and tool definitions at the start of the message list
- Keep
CLAUDE.md,AGENTS.md, and MCP schemas stable within a session - Avoid editing the system prompt every turn—see Prompt caching
OpenRouter surfaces cached_tokens in usage—if always zero, your client may be reshuffling prefixes.
Model list hygiene
- Remove deprecated slugs from config—fallback to wrong tiers wastes credits or fails
- Pin slugs in committed config; audit monthly against OpenRouter deprecations
- Use Narev or OpenRouter’s model page when comparing cross-provider $/M tokens
Avoid double routing
Do not stack OpenRouter behind another gateway unless you need both—each layer adds latency and sometimes fees.
4. Write tighter prompts
OpenRouter does not see your intent—only tokens. Apply Prompt hygiene in the calling tool:
Too broad:
Fix this bug. Also review the whole auth system and suggest improvements.Scoped:
Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.Pass user string per developer in API requests when supported—improves sticky routing and Activity attribution without storing prompts.
5. Set spending guardrails
OpenRouter provides credit pools and key-level limits—you must configure them.
- Set spending limit on each API key at creation
- Enable org-level alerts before credits deplete
- Block CI keys from frontier model slugs via client-side allowlists
- Review Activity weekly; export CSV for cost attribution by key
For metering and caps in products you ship, see Article I and Article IV.
Troubleshooting
Bill spike overnight — runaway agent loop on one key. Lower cap; haiku-only; fix tool step limits in client.
Always billed for expensive model — fallback never triggers because primary succeeds on frontier default. Change primary to haiku; escalate in models array only.
High cost, low cached_tokens — unstable system prompt or MCP schemas reshuffling. Stabilize prefix; disable unused MCP in client.
Context length errors — primary too small; fallback list empty. Add larger model as second entry in models.
Org Activity shows teammate spike — shared pool. Per-developer keys with caps.
Usage mismatch vs client — OpenRouter is billing truth. Reconcile local telemetry against Activity export.
When OpenRouter optimization is not enough
Gateway tuning does not fix unbounded production agents. If customer-facing features dominate spend, instrument with per-feature tags in your application middleware and apply Context hygiene and Article IV runtime caps. Narev normalizes USD across providers when you bill customers for inference.