Input vs. Output Tokens
Understanding the difference between input and output tokens is the foundation of LLM cost management.
Input tokens
Input tokens are everything sent to the model:
- System prompts and instructions
- User messages and conversation history
- Retrieved documents (RAG context)
- Tool definitions and function schemas
- Images and multimodal content
Input tokens are typically priced lower than output tokens, but they add up fast — especially with long context windows and agentic workflows that re-send large histories on every turn.
Output tokens
Output tokens are everything the model generates:
- Chat responses
- Structured JSON or tool calls
- Chain-of-thought reasoning (when exposed)
- Code completions
Output tokens are usually the most expensive per-token unit, and they are harder to predict because generation length varies by request.
Why both matter
A common mistake is optimizing only output length while ignoring input bloat. In agent workflows, input tokens often dominate total cost because each step re-sends the full conversation and tool results.
Rule of thumb: Track input and output separately. A 30% reduction in output tokens may save less than a 30% reduction in input tokens if your input is 10x larger.
Pricing example
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| Claude Sonnet | $3.00 | $15.00 |
A request with 10,000 input tokens and 500 output tokens on GPT-4o costs roughly $0.03 — but the same request with 100,000 input tokens costs $0.03 in input alone, before any output is generated.
Use Narev's pricing API (opens in a new tab) to look up current rates across providers.