Tokenminning in Cursor – Tokenminning

Cursor re-sends context on every agent step: rules, attached files, MCP tool schemas, and growing chat history. Most IDE spend comes from long threads and heavy configuration—not from one verbose reply.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Open cursor.com/dashboard → Usage and note which token groups are highest.
Use Auto or Composer for routine agent work. Reserve frontier models for tasks that actually need them.
Shorten project rules and disable MCP servers you are not using.
Start a new chat for each task—not one marathon thread.
Audit Memories and global user rules if input tokens stay high.

Typical impact when you follow the list: 60–90% savings on routine requests by switching models; 20–50% on input by trimming rules and MCP; 30–60% less cache growth from shorter chats. Benchmark on your own dashboard—your mix of Agent vs Tab and default models will differ from anyone else’s.

How Cursor bills a request

Each agent turn sends your prompt plus everything Cursor attaches. Follow-up messages reuse cached context as cheaper cache read tokens, but the cache keeps growing until you start a new chat.

Cursor splits usage into four groups that matter for debugging:

Input — prompt, rules, @ files, MCP schemas. High input means configuration bloat.
Cache write — context stored for reuse in later steps. High cache write means large first messages or heavy tool results.
Cache read — previously cached context reused at reduced cost. High cache read means the thread is too long.
Output — model responses, diffs, tool-call arguments. High output means verbose replies, thinking models, or too many revision cycles.

Individual plans pass through provider API cost at ×1.2. Cache reads cost roughly 10–25% of fresh input. Usage and limits documents plan allowances; Auto + Composer draw from a separate pool with more generous included usage than frontier API models.

1. Measure first

Where to look:

cursor.com/dashboard → Usage — real-time spend, remaining allowance, on-demand charges
Billing — subscription, usage-based pricing, spending limits
In-editor usage indicator — quick check while coding

Teams: per-user model usage via the Analytics API or Admin API .

After a heavy Agent week, check which token group moved most. That tells you which section below to prioritize.

2. Match the model to the task

See Models & Pricing for current rates. This is Cursor’s version of Model routing: default cheap, escalate only on failure.

Start here:

Tab — completions and small edits
Auto or Composer — log checks, grep-style questions, renames, most agent work
Mid-tier — multi-file refactors
Frontier (Sonnet, GPT-5.x, etc.) — deep debugging or novel design only
Thinking variants / Opus — last resort

Costs more than you expect:

Thinking / extended-reasoning models — extra reasoning tokens bill as output
Max Mode — full context window at API rates; use only when you need it
Frontier defaults on tasks Auto handles fine

When you switch models mid-chat, the new provider does not inherit the previous cache. Start a new chat when changing models.

3. Trim what rides along every request

Input bloat in Cursor usually comes from configuration—not your prompt text alone.

Rules

Project rules in .cursor/rules/ are injected into every Agent conversation. Cursor’s guidance is explicit: they compound.

One concern per rule file; keep them short
Use file globs instead of alwaysApply: true on everything
Reference files by path—do not paste entire style guides (use linters instead)
Do not duplicate the same instructions in rules, AGENTS.md, and CLAUDE.md
Move occasional instructions to Skills or @-mention rules manually

Audit user rules (Settings) and team rules (dashboard) the same way.

MCP servers

Each enabled server adds tool schemas to agent context—even when no tool is called. See Cursor MCP docs .

Disable servers you are not using this week
One narrow, task-specific server beats five overlapping ones

Memories

Memories persist facts across sessions. Useful for project decisions; costly when large fragments attach to every request. Review and prune in Settings → Rules. Disable if you do not need cross-session recall.

`@` mentions

Modern agents search the codebase on demand. You rarely need @codebase plus a large folder for a one-line fix.

Prefer a focused prompt: “fix spacing in Navbar.tsx only”
Use @filename instead of pasting full file contents into chat

See Context hygiene for the general just-in-time retrieval pattern.

New chat per task

Start a new chat when you finish one task and begin another, when you switch models, when cache read dominates your dashboard, or when the agent loops on a stuck problem.

4. Write tighter prompts

Cursor-specific versions of Prompt hygiene:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Batch related fixes in one message instead of five separate agent turns. Review diffs before accepting—each rejected revision is another output bill.

5. Set spending guardrails

Cursor does not enforce your inference budget. You set the limits.

Glance at Dashboard → Usage after heavy sessions
Know your plan’s included API allowance
Enable a usage-based spending limit in Dashboard → Billing
Team admins: usage analytics and per-seat types in the Teams dashboard

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

High input — rules, MCP, or heavy @ attachments. Shorten rules; disable unused MCP.

High output — verbose agent, thinking model, or many revision cycles. Tighter prompts; cheaper model; review before accepting.

High cache write — large tool results or big first messages. Narrow scope; truncate attachments.

High cache read — thread too long. New chat per task.

Spike after enabling Memories — memory fragments attach per request. Prune or disable.

Spike after switching models mid-chat — cache miss on new provider. New chat when switching.

When Cursor optimization is not enough

Trimming Cursor configuration does not fix production agent loops. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.