Chapter IV: scale and unit economics
Individual choices add up against physical and economic limits.
Facility expansion moves slowly; demand spikes quickly. Supply-side investment remains necessary, but consuming less per task frees capacity for other work. Inside one company that is margin; across the industry it eases pressure on grids and queues.
Lower token volume maps to lower compute and lower associated emissions in ways finance and ESG teams can audit — real reduction rather than purchased offsets alone.
For many production LLM programs, recurring inference — not one-off training — now dominates AI opex and grows with every feature and seat. Leaders often cannot state per-feature token use per session, which makes optimization a guess. Measurement is prerequisite, not garnish.
When two rivals ship similar user-visible quality but one spends materially less per interaction, unit economics diverge. As model capability converges on routine tasks, efficiency can matter as much as headline benchmark scores.
Previous: Chapter III: tokenminning · Next: Chapter V: commitments