Input and output tokens
Understanding how input tokens and output tokens are metered and priced is the starting point for token economics.
- Input tokens are everything you send to the model — prompts, history, retrieved documents, tool schemas, and images.
- Output tokens are everything the model generates — chat replies, JSON, tool calls, code, and reasoning text.
Providers bill input and output separately, usually at different per-token rates. Output tokens are typically more expensive, but input tokens often dominate total spend in agent and RAG workflows.
Input tokens
Input tokens include all content sent to the model before it generates a response:
- System prompts and instructions
- User messages and conversation history
- Retrieved documents (RAG context)
- Tool definitions and function schemas
- Images and other multimodal content
Input tokens are typically priced lower per token than output tokens. They still add up quickly — especially with long context windows and agentic workflows that re-send large histories on every turn.
Output tokens
Output tokens are everything the model produces in a response:
- Chat responses and completions
- Structured JSON or tool calls
- Chain-of-thought reasoning (when exposed)
- Code completions
Output tokens are usually the most expensive per-token unit, and they are harder to predict because generation length varies by request.
Why both matter for cost
A common mistake is optimizing only output length while ignoring input bloat. In agent workflows, input tokens often dominate total cost because each step re-sends the full conversation and accumulated tool results.
Rule of thumb: Track input and output separately. A 30% reduction in output tokens may save less than a 30% reduction in input tokens if your input is 10× larger.
How providers price them
Pricing varies by model and provider. Here is a representative snapshot:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| Claude Sonnet | $3.00 | $15.00 |
A request with 10,000 input tokens and 500 output tokens on GPT-4o costs roughly $0.03. The same request with 100,000 input tokens costs $0.25 in input alone — before any output is generated.
Use Narev's pricing API (opens in a new tab) to look up current rates across providers.
See also: Context inflation · Model selection