Tokenminning with OpenRouter – Tokenminning

OpenRouter is not an IDE—it is a unified LLM gateway. Cline, Aider, OpenCode, and custom agents send every request through your OpenRouter API key. Waste shows up as frontier defaults, missing fallback chains, duplicate keys across tools, and routing that ignores cached prefixes.

Use this guide alongside your editor guide (Cline, Aider, OpenCode, etc.). The optimization order matches Where to start: measure → route models → trim what each request carries → tighten prompts → set caps.

Quick checklist

Open OpenRouter Activity and export or note spend by model and API key.
Put cheap models first in models fallback arrays; reserve frontier IDs for escalation only.
Use separate API keys per tool or environment with individual spending limits.
Prefer explicit model IDs over openrouter/auto until you understand your traffic mix.
Log usage from responses—OpenRouter returns token and cost breakdown on every call.
Enable provider sort: "price" (default) unless latency truly matters.

Typical impact when you follow the list: 50–90% savings routing routine agent steps to haiku-tier models; 10–30% from cache-aware stable prefixes; runaway prevention from per-key caps. Benchmark on Activity exports—your tool mix differs from anyone else’s.

How OpenRouter bills a request

OpenRouter charges provider token cost plus a small platform fee. You prepay credits or pay on invoice depending on account type.

Each request specifies:

model — primary model slug (e.g. anthropic/claude-sonnet-4)
Optional models array — ordered fallbacks if the primary errors (rate limit, downtime, context length)
Optional provider object — sort by price, throughput, or latency; allow or deny specific providers

OpenRouter returns usage on every response (including streaming):

Prompt and completion token counts
cached_tokens and cache_write_tokens when the upstream provider supports caching
cost in credits for the model that actually served the request

Failed fallback attempts are not billed. You pay for the model that succeeds.

Organization accounts share a credit pool—Activity shows all members’ metadata (model, cost, timing) but not prompts.

1. Measure first

Where to look:

OpenRouter Activity — filter by API key, model, time range
Activity CSV/PDF export — group by Model or API Key for weekly review
Each tool’s local stats (opencode stats, Cline task hints, /tokens in Aider) — reconcile against OpenRouter as billing truth
GET /generation?id=... — fetch usage for a specific completion ID when debugging one expensive call

After a heavy week, check which API key and model slug dominated. Split keys if one agent tool consumed most credits.

2. Match the model to the task

This is OpenRouter’s version of Model routing: explicit cheap defaults, frontier only on failure or escalation.

Start here:

Haiku / flash / mini tier — classification, log grep, single-file edits
Sonnet / GPT-4.1 class — most multi-step agent work
Opus / o3 / thinking — deep debugging, architecture—only when cheaper models fail

Configure fallbacks — cheaper model first, capable model second:


{
  "model": "anthropic/claude-haiku-4-20250514",
  "models": ["anthropic/claude-haiku-4-20250514", "anthropic/claude-sonnet-4-5"]
}

In OpenAI-compatible clients, pass models via the vendor’s extension field (e.g. extra_body in Python).

Costs more than you expect:

openrouter/auto or :nitro variants on every request
sort: "throughput" when price sorting would suffice
Single frontier slug with no fallback—errors retry manually and burn wall-clock
Same key for CI bots and interactive dev without caps

Use OpenRouter model search for current pricing—rates change; link docs instead of hardcoding in your repo.

3. Trim what rides along every request

OpenRouter forwards whatever your client sends—it does not trim agent bloat. Reduce tokens upstream in each tool.

Per-key isolation

Dev key — interactive Cline/Aider with moderate cap
CI key — low cap, haiku-only model list
Prod key — application traffic separate from coding agents

Rotate keys when a tool leaks into the wrong environment.

Stable prefixes for caching

Providers discount cached input when the prompt prefix is stable.

Put static system prompts and tool definitions at the start of the message list
Keep CLAUDE.md, AGENTS.md, and MCP schemas stable within a session
Avoid editing the system prompt every turn—see Prompt caching

OpenRouter surfaces cached_tokens in usage—if always zero, your client may be reshuffling prefixes.

Model list hygiene

Remove deprecated slugs from config—fallback to wrong tiers wastes credits or fails
Pin slugs in committed config; audit monthly against OpenRouter deprecations
Use Narev or OpenRouter’s model page when comparing cross-provider $/M tokens

Avoid double routing

Do not stack OpenRouter behind another gateway unless you need both—each layer adds latency and sometimes fees.

4. Write tighter prompts

OpenRouter does not see your intent—only tokens. Apply Prompt hygiene in the calling tool:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Pass user string per developer in API requests when supported—improves sticky routing and Activity attribution without storing prompts.

5. Set spending guardrails

OpenRouter provides credit pools and key-level limits—you must configure them.

Set spending limit on each API key at creation
Enable org-level alerts before credits deplete
Block CI keys from frontier model slugs via client-side allowlists
Review Activity weekly; export CSV for cost attribution by key

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

Bill spike overnight — runaway agent loop on one key. Lower cap; haiku-only; fix tool step limits in client.

Always billed for expensive model — fallback never triggers because primary succeeds on frontier default. Change primary to haiku; escalate in models array only.

High cost, low cached_tokens — unstable system prompt or MCP schemas reshuffling. Stabilize prefix; disable unused MCP in client.

Context length errors — primary too small; fallback list empty. Add larger model as second entry in models.

Org Activity shows teammate spike — shared pool. Per-developer keys with caps.

Usage mismatch vs client — OpenRouter is billing truth. Reconcile local telemetry against Activity export.

When OpenRouter optimization is not enough

Gateway tuning does not fix unbounded production agents. If customer-facing features dominate spend, instrument with per-feature tags in your application middleware and apply Context hygiene and Article IV runtime caps. Narev normalizes USD across providers when you bill customers for inference.