Skip to Content
IDEsOpenRouter

Tokenminning with OpenRouter

OpenRouter is not an IDE—it is a unified LLM gateway. Cline, Aider, OpenCode, and custom agents send every request through your OpenRouter API key. Waste shows up as frontier defaults, missing fallback chains, duplicate keys across tools, and routing that ignores cached prefixes.

Use this guide alongside your editor guide (Cline, Aider, OpenCode, etc.). The optimization order matches Where to start: measure → route models → trim what each request carries → tighten prompts → set caps.

Quick checklist

  1. Open OpenRouter Activity  and export or note spend by model and API key.
  2. Put cheap models first in models fallback arrays; reserve frontier IDs for escalation only.
  3. Use separate API keys per tool or environment with individual spending limits.
  4. Prefer explicit model IDs over openrouter/auto until you understand your traffic mix.
  5. Log usage from responses—OpenRouter returns token and cost breakdown on every call.
  6. Enable provider sort: "price" (default) unless latency truly matters.

Typical impact when you follow the list: 50–90% savings routing routine agent steps to haiku-tier models; 10–30% from cache-aware stable prefixes; runaway prevention from per-key caps. Benchmark on Activity exports—your tool mix differs from anyone else’s.

How OpenRouter bills a request

OpenRouter charges provider token cost plus a small platform fee. You prepay credits or pay on invoice depending on account type.

Each request specifies:

  • model — primary model slug (e.g. anthropic/claude-sonnet-4)
  • Optional models array — ordered fallbacks if the primary errors (rate limit, downtime, context length)
  • Optional provider object — sort by price, throughput, or latency; allow or deny specific providers

OpenRouter returns usage on every response (including streaming):

  • Prompt and completion token counts
  • cached_tokens and cache_write_tokens when the upstream provider supports caching
  • cost in credits for the model that actually served the request

Failed fallback attempts are not billed. You pay for the model that succeeds.

Organization accounts share a credit pool—Activity shows all members’ metadata (model, cost, timing) but not prompts.

1. Measure first

Where to look:

  • OpenRouter Activity  — filter by API key, model, time range
  • Activity CSV/PDF export — group by Model or API Key for weekly review
  • Each tool’s local stats (opencode stats, Cline task hints, /tokens in Aider) — reconcile against OpenRouter as billing truth
  • GET /generation?id=... — fetch usage for a specific completion ID when debugging one expensive call

After a heavy week, check which API key and model slug dominated. Split keys if one agent tool consumed most credits.

2. Match the model to the task

This is OpenRouter’s version of Model routing: explicit cheap defaults, frontier only on failure or escalation.

Start here:

  • Haiku / flash / mini tier — classification, log grep, single-file edits
  • Sonnet / GPT-4.1 class — most multi-step agent work
  • Opus / o3 / thinking — deep debugging, architecture—only when cheaper models fail

Configure fallbacks — cheaper model first, capable model second:

{ "model": "anthropic/claude-haiku-4-20250514", "models": ["anthropic/claude-haiku-4-20250514", "anthropic/claude-sonnet-4-5"] }

In OpenAI-compatible clients, pass models via the vendor’s extension field (e.g. extra_body in Python).

Costs more than you expect:

  • openrouter/auto or :nitro variants on every request
  • sort: "throughput" when price sorting would suffice
  • Single frontier slug with no fallback—errors retry manually and burn wall-clock
  • Same key for CI bots and interactive dev without caps

Use OpenRouter model search  for current pricing—rates change; link docs instead of hardcoding in your repo.

3. Trim what rides along every request

OpenRouter forwards whatever your client sends—it does not trim agent bloat. Reduce tokens upstream in each tool.

Per-key isolation

  • Dev key — interactive Cline/Aider with moderate cap
  • CI key — low cap, haiku-only model list
  • Prod key — application traffic separate from coding agents

Rotate keys when a tool leaks into the wrong environment.

Stable prefixes for caching

Providers discount cached input when the prompt prefix is stable.

  • Put static system prompts and tool definitions at the start of the message list
  • Keep CLAUDE.md, AGENTS.md, and MCP schemas stable within a session
  • Avoid editing the system prompt every turn—see Prompt caching

OpenRouter surfaces cached_tokens in usage—if always zero, your client may be reshuffling prefixes.

Model list hygiene

  • Remove deprecated slugs from config—fallback to wrong tiers wastes credits or fails
  • Pin slugs in committed config; audit monthly against OpenRouter deprecations
  • Use Narev  or OpenRouter’s model page when comparing cross-provider $/M tokens

Avoid double routing

Do not stack OpenRouter behind another gateway unless you need both—each layer adds latency and sometimes fees.

4. Write tighter prompts

OpenRouter does not see your intent—only tokens. Apply Prompt hygiene in the calling tool:

Too broad:

Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:

Fix ONLY the null check in auth/login.ts line 42. No explanations. Max 1 file changed.

Pass user string per developer in API requests when supported—improves sticky routing and Activity attribution without storing prompts.

5. Set spending guardrails

OpenRouter provides credit pools and key-level limits—you must configure them.

  • Set spending limit on each API key at creation
  • Enable org-level alerts before credits deplete
  • Block CI keys from frontier model slugs via client-side allowlists
  • Review Activity weekly; export CSV for cost attribution by key

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

Bill spike overnight — runaway agent loop on one key. Lower cap; haiku-only; fix tool step limits in client.

Always billed for expensive model — fallback never triggers because primary succeeds on frontier default. Change primary to haiku; escalate in models array only.

High cost, low cached_tokens — unstable system prompt or MCP schemas reshuffling. Stabilize prefix; disable unused MCP in client.

Context length errors — primary too small; fallback list empty. Add larger model as second entry in models.

Org Activity shows teammate spike — shared pool. Per-developer keys with caps.

Usage mismatch vs client — OpenRouter is billing truth. Reconcile local telemetry against Activity export.

When OpenRouter optimization is not enough

Gateway tuning does not fix unbounded production agents. If customer-facing features dominate spend, instrument with per-feature tags in your application middleware and apply Context hygiene and Article IV runtime caps. Narev  normalizes USD across providers when you bill customers for inference.

Last updated on