Tokenminning in Zed – Tokenminning

Zed is a GPU-rendered IDE with a built-in AI assistant. Each assistant turn resends your prompt, project rules, attached context, and conversation history. Most spend comes from long threads, frontier defaults on routine edits, and broad context attachment—not from one verbose reply.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Open Assistant settings and note your default model and provider billing destination.
Use faster or smaller models for completions and quick questions. Reserve frontier models for tasks that actually need them.
Keep rules (.rules, project instructions) short and scoped.
Attach only the files or symbols the task needs—avoid whole-workspace dumps.
Start a new assistant thread per task—not one marathon conversation.

Typical impact when you follow the list: 40–65% savings by routing routine work to mid-tier models; 20–40% on input by trimming rules and attachments; 30–50% less context growth from shorter threads. Benchmark on your own provider dashboard—Zed’s UI is fast; inference still bills per token.

How Zed bills a request

Zed assistant features use your configured LLM provider (Anthropic, OpenAI, Ollama locally, etc.) or Zed’s hosted inference depending on plan. There is no separate “Tab pool” vs “Agent pool” like Cursor—check your plan and provider for what is included vs BYOK.

Each assistant turn sends:

Your prompt and explicit @ context (files, symbols, diagnostics)
Project and user rules when configured
Prior messages in the active thread
Tool results when agent-style workflows are enabled

Local models (Ollama) trade cloud cost for hardware and often lower quality—still meter wall-clock and GPU time for fair comparison.

1. Measure first

Where to look:

Your LLM provider dashboard — Anthropic, OpenAI, etc.
Zed account or subscription page if using hosted inference
Assistant thread length — long threads inflate input on every follow-up

After a heavy week, check whether spend is input-heavy (rules, attachments) or output-heavy (frontier model, long replies). That tells you which section below to prioritize.

2. Match the model to the task

This is Zed’s version of Model routing: default cheap, escalate only on failure.

Start here:

Inline completions / fast model — single-line edits, small refactors
Mid-tier assistant — multi-file questions, most routine agent work
Frontier — deep debugging, architecture decisions, novel design only

Costs more than you expect:

Frontier model as default for every @-mention
Agent loops that re-read large files each step
Cloud models when a local Ollama model would suffice for exploration

Switch models or start a new thread when you change task complexity—do not escalate mid-thread without reason.

3. Trim what rides along every request

Input bloat in Zed usually comes from rules and over-broad @ context—not your prompt text alone.

Rules and project instructions

Project rules inject into assistant context when configured.

One concern per rules file; keep them short
Reference configs by path—do not paste entire style guides
Do not duplicate instructions in rules, README, and chat Custom Instructions

Context attachment

Zed’s assistant supports precise @ references—use them instead of pasting file contents.

@file or symbol-level context beats attaching whole directories
Close unrelated buffers before agent work so “active file” context stays lean
Prefer “fix Navbar.tsx line 42” over “review the frontend”

See Context hygiene for the general just-in-time retrieval pattern.

New thread per task

Start a new assistant thread when you finish one task and begin another, when you switch models, or when follow-up turns feel slow or expensive.

4. Write tighter prompts

Zed-specific versions of Prompt hygiene:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Batch related fixes in one message. Accept or reject edits before asking for another pass—each cycle bills again.

5. Set spending guardrails

Zed does not enforce your inference budget. You set the limits.

Set provider-side spending caps on API keys
Use local models for exploratory work when quality is acceptable
Review provider usage after heavy assistant weeks
Team setups: separate keys per developer or project

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

High input — rules bloat or broad @ attachments. Trim rules; narrow file context.

High output — frontier default or verbose assistant. Cheaper model; tighter prompts.

Slow assistant, high tokens — thread too long. New thread per task.

Local model quality too low — route only exploration locally; use cloud mid-tier for commits.

When Zed optimization is not enough

Trimming Zed configuration does not fix production agent loops. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.