Tokenminning in Devin
Devin spans local agents (Cascade, Devin Local) and autonomous cloud sessions. Each surface meters differently, but the waste patterns are familiar: oversized rules, idle MCP schemas, long conversations, and frontier models on routine work.
Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.
Quick checklist
- Open Devin Settings → Plan Info (or manage your plan ) and note remaining quota, credits, or ACUs.
- Set Adaptive as your default model in Devin Desktop. Reserve frontier models for tasks that actually need them.
- Audit rules: switch
always_onrules toglobormodel_decisionwhere possible. - Disable MCP servers and tools you are not using this week. Stay under the 100-tool cap.
- Start a new conversation per task — not one marathon thread across unrelated work.
- For cloud Devin: scope one task per session and let idle sessions sleep.
Typical impact when you follow the list: 30–60% savings switching from Cascade to Devin Local on the same tasks; 20–50% on input by trimming always_on rules and MCP; meaningful ACU reduction on cloud sessions by scoping tasks and avoiding back-and-forth. Benchmark on your own dashboard — your mix of local vs cloud and billing model will differ from anyone else’s.
How Devin bills a request
Devin is not one meter. Local agents and cloud sessions use different units depending on your plan.
Devin Desktop (local agents)
Local agents — Cascade, Devin Local, Devin CLI — bill from inference:
- Self-serve — monthly quota, then on-demand usage at per-token rates. See Plans and Usage .
- Enterprise (ACUs) — tokens convert to Agent Compute Units at rates on the models page . See Enterprise billing .
- Legacy enterprise (credits) — one prompt credit per message to Cascade with a premium model, regardless of how many tool calls follow. See the credits tab in Plans and Usage .
Token groups that matter for local debugging (ACU and quota plans):
- Input — prompt, rules, attached context, MCP schemas. High input means configuration bloat.
- Cache write — context stored for reuse. High cache write means large first messages or heavy tool results.
- Cache read — previously cached context at reduced cost. High cache read means the thread is too long.
- Output — model responses, diffs, tool-call arguments. High output means verbose replies, thinking models, or too many revision cycles.
Adaptive draws down quota at a fixed per-token rate on self-serve plans (2.00/M output, $0.10/M cache read through July 7, 2026) while routing simpler tasks to lighter models underneath.
Devin Cloud (autonomous sessions)
Cloud Devin bills in ACUs (or self-serve quota/credits) based on work performed — not just tokens:
- Planning, context gathering, code execution, browser actions, and tool use
- VM time and networking (typically a small fraction)
- Windows sessions cost ~9% more than equivalent Linux sessions
Cloud Devin sleeps when idle and does not consume usage while sleeping. It also does not bill while waiting for your reply or while a test suite runs. See Usage .
1. Measure first
Devin Desktop (local):
- Devin Settings → Plan Info — quota, credits, or ACU balance
- windsurf.com/subscription/manage-plan — billing cycle and invoices
- Teams/Enterprise: Get Consumption API and Cascade Analytics API
Devin Cloud:
- Session Insights — per-session ACU or credit consumption (any user)
- Settings → Plans (self-serve) or Settings → Consumption (enterprise)
- Enterprise API: daily consumption and session metrics by category
After a heavy week, check which surface and token group moved most. That tells you which section below to prioritize.
2. Match the model to the task
See AI Models for current rates and credit multipliers. This is Devin’s version of Model routing: default cheap, escalate only on failure.
Start here:
- Adaptive — default for most Devin Desktop work; routes simple tasks to efficient models
- Devin Local — primary local agent; up to 30% fewer tokens than Cascade on the same tasks, with better prompt caching
- SWE-1.6 / SWE-1.5 — Cognition models tuned for coding; strong cost/performance for agent work
- Mid-tier (Sonnet, GPT-4.1) — multi-file refactors
- Frontier (Opus, o3, thinking variants) — deep debugging or novel design only
Costs more than you expect:
- Thinking / extended-reasoning models — extra reasoning tokens bill as output; Opus carries high credit multipliers (e.g. 6–20× on legacy credit plans)
- Arena mode — multiple Cascade instances in parallel; each instance bills independently
- Subagents (Devin Local) — independent conversation chains; useful but multiply inference when spawned in parallel
- Devin Cloud for local-sized tasks — VM overhead and action-based metering on work you could finish in Devin Local
Enterprise teams: Adaptive is disabled by default. Admins must enable it in team settings before members can select it. See Adaptive enterprise availability .
When you switch models mid-conversation, caching behavior resets. Start a new conversation when changing models.
3. Trim what rides along every request
Input bloat in Devin Desktop usually comes from configuration — not your prompt text alone.
Rules and AGENTS.md
Project rules in .devin/rules/ (or legacy .windsurf/rules/) control what Cascade and Devin Local see. Memories & Rules documents activation modes — each mode has a different context cost:
| Mode | trigger: | Context cost |
|---|---|---|
| Always On | always_on | Full rule on every message |
| Model Decision | model_decision | Description always; full content on demand |
| Glob | glob | Only when matching files are touched |
| Manual | manual | Only when @rule-name is mentioned |
- Prefer
globormodel_decisionoveralways_on - One concern per rule file; workspace rules cap at 12,000 characters, global at 6,000
- Use AGENTS.md for directory-scoped instructions instead of pasting style guides into chat
- Do not duplicate the same instructions in rules,
AGENTS.md, and auto-generated memories - For durable team knowledge, write Rules — not memories. Memories are local to your machine and not version-controlled
Audit global rules (~/.codeium/windsurf/memories/global_rules.md) and enterprise system rules the same way.
MCP servers
Each enabled server adds tool schemas to agent context. Cascade has a 100-tool limit across all servers. See MCP docs .
- Disable servers you are not using this week
- Toggle off individual tools you do not need on each server
- One narrow, task-specific server beats five overlapping ones
- Devin Local prompts before MCP tool calls by default — fewer accidental invocations, but approval fatigue can slow you down; configure permissions once and reuse
Memories
Memories persist facts across sessions. Creating and retrieving auto-generated memories does not consume credits, but relevant memories still attach to context when retrieved. For anything you want reliably reused, prefer Rules or AGENTS.md.
Indexing and ignore files
Devin Desktop indexes your codebase for context retrieval. Bloated indexes slow retrieval and inflate irrelevant context. See Devin Desktop Ignore .
- Add
.codeiumignorefor generated artifacts, vendored deps, and large binary trees not already in.gitignore - Tune Max Workspace Size if indexing is pulling in too many files
- Enterprise: use global
~/.codeium/.codeiumignorefor org-wide exclusions
Skills, workflows, and web search
- Skills — loaded when the model invokes them or you
@mentionthem; better than pasting reference files into every prompt. See Skills . - Workflows — manual slash commands only; no standing context cost until invoked. See Workflows .
- @web / @docs — useful but adds retrieval tokens; scope searches instead of broad “search the internet for best practices” prompts
New conversation per task
Start a new conversation when you finish one task and begin another, when you switch models, when cache read dominates your usage, or when the agent loops on a stuck problem.
For cloud Devin, Devin’s own guidance applies: delegate clearly scoped tasks, keep sessions short, and split big projects across sessions. See Managing usage effectively .
4. Write tighter prompts
Devin-specific versions of Prompt hygiene. See also Prompt Engineering .
Too broad:
Fix this bug. Also review the whole auth system and suggest improvements.Scoped:
Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.Use @filename, @func:, and @class: mentions instead of pasting full file contents. Batch related fixes in one message instead of five separate agent turns.
For cloud Devin, include a well-defined end goal and acceptance criteria in the first message — vague delegation burns ACUs on exploration.
5. Set spending guardrails
Devin does not enforce your inference budget. You set the limits.
- Glance at Plan Info after heavy sessions
- Know your plan’s included quota or ACU volume
- Self-serve: monitor on-demand credit balance at Settings → Plans
- Legacy credit plans: set automatic refill caps in plan settings to avoid runaway top-ups
- Enterprise admins: per-org ACU limits, usage configuration API , and consumption analytics
- Cloud: let idle sessions sleep; do not keep VMs awake with open-ended “stand by” threads
For metering and caps in products you ship, see Article I and Article IV.
Troubleshooting
High input (local) — always_on rules, MCP schemas, or bloated index. Switch rules to glob/model_decision; disable unused MCP; tighten .codeiumignore.
High output (local) — verbose agent, thinking model, or many revision cycles. Tighter prompts; Adaptive or Devin Local; review diffs before accepting.
High cache write — large tool results or big first messages. Narrow scope; truncate attachments.
High cache read — thread too long. New conversation per task.
High ACUs (cloud) — task too broad, long session, or frequent back-and-forth. One scoped task per session; split work across sessions.
Spike after enabling Arena mode — parallel agents bill independently. Use for exploration, not routine fixes.
Spike after switching models mid-conversation — cache miss on new provider. New conversation when switching.
Windows cloud sessions — ~9% ACU premium over Linux. Use Linux VMs when possible.
When Devin optimization is not enough
Trimming Devin configuration does not fix production agent loops. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.
Related
- IDEs
- Where to start
- Model routing
- Context hygiene
- Devin: Usage and metering
- Devin Desktop: Plans and Usage
- Devin Desktop: Adaptive
- Devin Desktop: AI Models
- Devin Desktop: Memories & Rules
- Devin Desktop: MCP
- Devin Desktop: Devin Local
- Devin Desktop: Devin Cloud in Desktop
- Devin Desktop: Prompt Engineering