Tokenminning in GitHub Copilot – Tokenminning

GitHub Copilot integrates completions, chat, and agent mode into VS Code and other editors. Each chat or agent turn resends workspace context, custom instructions, and conversation history. Most spend comes from agent mode on routine tasks, premium model defaults, and chat where inline would suffice—not from one verbose reply.

Work through the sections below in order. For the general technique stack, see Where to start. For underlying patterns, see Context hygiene, Model routing, and Prompt hygiene.

Quick checklist

Open GitHub Copilot usage in your account or org settings and note premium request burn.
Use inline completions and Copilot Edits for single-file work. Reserve agent mode for multi-step tasks.
Trim .github/copilot-instructions.md and repository custom instructions.
Avoid @workspace when @file or a focused prompt is enough.
Start a new chat per task—not one thread across unrelated work.

Typical impact when you follow the list: 30–50% savings using inline/edits over agent for simple fixes; 20–40% on premium requests by avoiding frontier chat defaults; meaningful quota relief by scoping @workspace. Benchmark on your own Copilot usage dashboard—org plans differ from individual Pro.

How Copilot bills a request

Individual Copilot Pro (~$10/mo) includes a monthly allowance of premium requests for chat and agent features; inline completions use a separate pool. Copilot Business/Enterprise meters per seat with org-level policies.

Each chat or agent turn may send:

Your prompt and @ references (@workspace, @file, symbols)
Repository and org custom instructions
Open files and workspace index snippets
Prior chat messages in the thread

Inline completions and next-edit suggestions typically cost less than full chat/agent turns—use them when the task is local to one file.

Agent mode runs multi-step tool loops; each step can consume premium requests and attach growing context. Treat agent mode as expensive relative to inline.

1. Measure first

Where to look:

GitHub → Settings → Copilot → Usage (individual)
Org admin: Copilot metrics and premium request analytics
IDE Copilot panel — premium request indicators when shown

After a heavy week, check whether premium requests burned on agent mode, @workspace chat, or frontier model defaults. That tells you which section below to prioritize.

2. Match the model to the task

Copilot model choice depends on plan and IDE version. This is Copilot’s version of Model routing: use the cheapest surface that works.

Start here:

Inline completions — line-level and small block edits
Copilot Edits — single-file or small multi-file guided changes
Chat (default model) — explanations, scoped questions
Agent mode — multi-step refactors, cross-repo tasks that truly need autonomy

Costs more than you expect:

Agent mode for one-line fixes
@workspace on every question
Chat when inline would answer
Premium / advanced models for routine refactors

Enterprise admins can restrict agent mode or model access—use org lint rules for teams that over-use agent chat.

3. Trim what rides along every request

Input bloat in Copilot usually comes from custom instructions and @workspace—not your prompt text alone.

Custom instructions

Repository instructions live in .github/copilot-instructions.md and org-level policies.

Keep instructions short — build commands, test entry points, architecture summary
Do not paste entire CONTRIBUTING guides into instructions
Align with existing linters instead of duplicating style rules in prose

`@workspace` vs `@file`

@workspace pulls broad codebase context—expensive for narrow fixes.

Prefer @filename or symbol references
Prompt: “in auth/login.ts only” beats “anywhere in the repo”
Let Copilot search when needed rather than front-loading the whole tree

See Context hygiene for the general just-in-time retrieval pattern.

Chat vs inline vs agent

Match the surface to the task:

Task	Prefer
Complete this line	Inline
Refactor this function	Edits
Explain this error	Chat
Migrate module across 10 files	Agent

Starting agent mode for every task is the Copilot equivalent of unbounded agent loops.

New chat per task

Start a new chat when you finish one task and begin another or when the thread grows long.

4. Write tighter prompts

Copilot-specific versions of Prompt hygiene:

Too broad:


Fix this bug. Also review the whole auth system and suggest improvements.

Scoped:


Fix ONLY the null check in auth/login.ts line 42.
No explanations. Max 1 file changed.

Accept or reject edits before asking for another pass—each agent cycle may consume premium requests.

5. Set spending guardrails

Copilot does not enforce your product inference budget, but org admins can set Copilot policies.

Monitor premium request usage before month-end
Enterprise: disable agent mode for roles that only need inline
Enterprise: model allowlists — block Opus-class models for routine tiers
Individual: switch to inline when chat quota runs low

For metering and caps in products you ship, see Article I and Article IV.

Troubleshooting

Premium requests exhausted mid-month — agent mode and advanced chat defaults. Use inline/edits; new chat per task.

High context, slow replies — @workspace or long chat thread. Narrow @ scope; new chat.

Agent loops — scope too broad. Stop; restate acceptance criteria; use Edits instead.

Org policy blocks model — admin allowlist. Request mid-tier default for daily work.

When Copilot optimization is not enough

Trimming Copilot usage does not fix production agent loops in your own API routes. If customer-facing features dominate spend, instrument with per-feature tags and apply Context hygiene, Prompt caching, and Output and RAG. Narev provides normalized USD across providers if you need cross-provider cost math.