Tokenminning in Devin – Tokenminning

Devin spans local agents (Cascade, Devin Local) and autonomous cloud sessions. Each surface meters differently, but the waste patterns are familiar: oversized rules, idle MCP schemas, long conversations, and frontier models on routine work.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Open Devin Settings → Plan Info (or manage your plan ) and note remaining quota, credits, or ACUs.
Set Adaptive as your default model in Devin Desktop. Reserve frontier models for tasks that actually need them.
Audit rules: switch always_on rules to glob or model_decision where possible.
Disable MCP servers and tools you are not using this week. Stay under the 100-tool cap.
Start a new conversation per task — not one marathon thread across unrelated work.
For cloud Devin: scope one task per session and let idle sessions sleep.

Typical impact when you follow the list: 30–60% savings switching from Cascade to Devin Local on the same tasks; 20–50% on input by trimming always_on rules and MCP; meaningful ACU reduction on cloud sessions by scoping tasks and avoiding back-and-forth. Benchmark on your own dashboard — your mix of local vs cloud and billing model will differ from anyone else’s.

How Devin bills a request

Devin is not one meter. Local agents and cloud sessions use different units depending on your plan.

Devin Desktop (local agents)

Local agents — Cascade, Devin Local, Devin CLI — bill from inference:

Self-serve — monthly quota, then on-demand usage at per-token rates. See Plans and Usage .
Enterprise (ACUs) — tokens convert to Agent Compute Units at rates on the models page . See Enterprise billing .
Legacy enterprise (credits) — one prompt credit per message to Cascade with a premium model, regardless of how many tool calls follow. See the credits tab in Plans and Usage .

Token groups that matter for local debugging (ACU and quota plans):

Input — prompt, rules, attached context, MCP schemas. High input means configuration bloat.
Cache write — context stored for reuse. High cache write means large first messages or heavy tool results.
Cache read — previously cached context at reduced cost. High cache read means the thread is too long.
Output — model responses, diffs, tool-call arguments. High output means verbose replies, thinking models, or too many revision cycles.

Adaptive draws down quota at a fixed per-token rate on self-serve plans ( $0.50/M input,$ 2.00/M output, $0.10/M cache read through July 7, 2026) while routing simpler tasks to lighter models underneath.

Devin Cloud (autonomous sessions)

Cloud Devin bills in ACUs (or self-serve quota/credits) based on work performed — not just tokens:

Planning, context gathering, code execution, browser actions, and tool use
VM time and networking (typically a small fraction)
Windows sessions cost ~9% more than equivalent Linux sessions

Cloud Devin sleeps when idle and does not consume usage while sleeping. It also does not bill while waiting for your reply or while a test suite runs. See Usage .

1. Measure first

Devin Desktop (local):

Devin Settings → Plan Info — quota, credits, or ACU balance
windsurf.com/subscription/manage-plan — billing cycle and invoices
Teams/Enterprise: Get Consumption API and Cascade Analytics API

Devin Cloud:

Session Insights — per-session ACU or credit consumption (any user)
Settings → Plans (self-serve) or Settings → Consumption (enterprise)
Enterprise API: daily consumption and session metrics by category

After a heavy week, check which surface and token group moved most. That tells you which section below to prioritize.

2. Match the model to the task

See AI Models for current rates and credit multipliers. This is Devin’s version of Model routing: default cheap, escalate only on failure.

Start here:

Adaptive — default for most Devin Desktop work; routes simple tasks to efficient models
Devin Local — primary local agent; up to 30% fewer tokens than Cascade on the same tasks, with better prompt caching
SWE-1.6 / SWE-1.5 — Cognition models tuned for coding; strong cost/performance for agent work
Mid-tier (Sonnet, GPT-4.1) — multi-file refactors
Frontier (Opus, o3, thinking variants) — deep debugging or novel design only

Costs more than you expect:

Thinking / extended-reasoning models — extra reasoning tokens bill as output; Opus carries high credit multipliers (e.g. 6–20× on legacy credit plans)
Arena mode — multiple Cascade instances in parallel; each instance bills independently
Subagents (Devin Local) — independent conversation chains; useful but multiply inference when spawned in parallel
Devin Cloud for local-sized tasks — VM overhead and action-based metering on work you could finish in Devin Local

Enterprise teams: Adaptive is disabled by default. Admins must enable it in team settings before members can select it. See Adaptive enterprise availability .

When you switch models mid-conversation, caching behavior resets. Start a new conversation when changing models.

3. Trim what rides along every request

Input bloat in Devin Desktop usually comes from configuration — not your prompt text alone.

Rules and AGENTS.md

Project rules in .devin/rules/ (or legacy .windsurf/rules/) control what Cascade and Devin Local see. Memories & Rules documents activation modes — each mode has a different context cost:

Mode	`trigger:`	Context cost
Always On	`always_on`	Full rule on every message
Model Decision	`model_decision`	Description always; full content on demand
Glob	`glob`	Only when matching files are touched
Manual	`manual`	Only when `@rule-name` is mentioned

Prefer glob or model_decision over always_on
One concern per rule file; workspace rules cap at 12,000 characters, global at 6,000
Use AGENTS.md for directory-scoped instructions instead of pasting style guides into chat
Do not duplicate the same instructions in rules, AGENTS.md, and auto-generated memories
For durable team knowledge, write Rules — not memories. Memories are local to your machine and not version-controlled

Audit global rules (~/.codeium/windsurf/memories/global_rules.md) and enterprise system rules the same way.

MCP servers

Each enabled server adds tool schemas to agent context. Cascade has a 100-tool limit across all servers. See MCP docs .

Disable servers you are not using this week
Toggle off individual tools you do not need on each server
One narrow, task-specific server beats five overlapping ones
Devin Local prompts before MCP tool calls by default — fewer accidental invocations, but approval fatigue can slow you down; configure permissions once and reuse

Memories

Memories persist facts across sessions. Creating and retrieving auto-generated memories does not consume credits, but relevant memories still attach to context when retrieved. For anything you want reliably reused, prefer Rules or AGENTS.md.

Indexing and ignore files

Devin Desktop indexes your codebase for context retrieval. Bloated indexes slow retrieval and inflate irrelevant context. See Devin Desktop Ignore .

Add .codeiumignore for generated artifacts, vendored deps, and large binary trees not already in .gitignore
Tune Max Workspace Size if indexing is pulling in too many files
Enterprise: use global ~/.codeium/.codeiumignore for org-wide exclusions

Skills, workflows, and web search

Skills — loaded when the model invokes them or you @mention them; better than pasting reference files into every prompt. See Skills .
Workflows — manual slash commands only; no standing context cost until invoked. See Workflows .
@web / @docs — useful but adds retrieval tokens; scope searches instead of broad “search the internet for best practices” prompts

New conversation per task

Start a new conversation when you finish one task and begin another, when you switch models, when cache read dominates your usage, or when the agent loops on a stuck problem.

For cloud Devin, Devin’s own guidance applies: delegate clearly scoped tasks, keep sessions short, and split big projects across sessions. See Managing usage effectively .

4. Write tighter prompts

Devin-specific versions of Prompt hygiene. See also Prompt Engineering .

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Use @filename, @func:, and @class: mentions instead of pasting full file contents. Batch related fixes in one message instead of five separate agent turns.

For cloud Devin, include a well-defined end goal and acceptance criteria in the first message — vague delegation burns ACUs on exploration.

5. Set spending guardrails

Devin does not enforce your inference budget. You set the limits.

Glance at Plan Info after heavy sessions
Know your plan’s included quota or ACU volume
Self-serve: monitor on-demand credit balance at Settings → Plans
Legacy credit plans: set automatic refill caps in plan settings to avoid runaway top-ups
Enterprise admins: per-org ACU limits, usage configuration API , and consumption analytics
Cloud: let idle sessions sleep; do not keep VMs awake with open-ended “stand by” threads

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

High input (local) — always_on rules, MCP schemas, or bloated index. Switch rules to glob/model_decision; disable unused MCP; tighten .codeiumignore.

High output (local) — verbose agent, thinking model, or many revision cycles. Tighter prompts; Adaptive or Devin Local; review diffs before accepting.

High cache write — large tool results or big first messages. Narrow scope; truncate attachments.

High cache read — thread too long. New conversation per task.

High ACUs (cloud) — task too broad, long session, or frequent back-and-forth. One scoped task per session; split work across sessions.

Spike after enabling Arena mode — parallel agents bill independently. Use for exploration, not routine fixes.

Spike after switching models mid-conversation — cache miss on new provider. New conversation when switching.

Windows cloud sessions — ~9% ACU premium over Linux. Use Linux VMs when possible.

When Devin optimization is not enough

Trimming Devin configuration does not fix production agent loops. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.