Input vs. Output Tokens

Input vs. Output Tokens

Understanding the difference between input and output tokens is the foundation of LLM cost management.

Input tokens

Input tokens are everything sent to the model:

  • System prompts and instructions
  • User messages and conversation history
  • Retrieved documents (RAG context)
  • Tool definitions and function schemas
  • Images and multimodal content

Input tokens are typically priced lower than output tokens, but they add up fast — especially with long context windows and agentic workflows that re-send large histories on every turn.

Output tokens

Output tokens are everything the model generates:

  • Chat responses
  • Structured JSON or tool calls
  • Chain-of-thought reasoning (when exposed)
  • Code completions

Output tokens are usually the most expensive per-token unit, and they are harder to predict because generation length varies by request.

Why both matter

A common mistake is optimizing only output length while ignoring input bloat. In agent workflows, input tokens often dominate total cost because each step re-sends the full conversation and tool results.

Rule of thumb: Track input and output separately. A 30% reduction in output tokens may save less than a 30% reduction in input tokens if your input is 10x larger.

Pricing example

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-4o$2.50$10.00
GPT-4o mini$0.15$0.60
Claude Sonnet$3.00$15.00

A request with 10,000 input tokens and 500 output tokens on GPT-4o costs roughly $0.03 — but the same request with 100,000 input tokens costs $0.03 in input alone, before any output is generated.

Use Narev's pricing API (opens in a new tab) to look up current rates across providers.


Tokenminning · Built by Narev