Article III: Context Window Sovereignty
The Law of State
Infinite context is a provider marketing feature, not an architectural permission. Unchecked context growth is the primary mechanism by which agentic workflows bankrupt startups.
Section 1: State Over History
Agentic workflows are forbidden from blindly appending raw tool-call outputs to the context window.
Each agent turn does not need the full transcript of everything that ever happened. It needs the current state of the task. Appending raw API responses, full web page scrapes, or complete file contents to context is state pollution.
Enforcement:
- Tool outputs pass through a truncation or summarization gate before entering context
- Context size is monitored per turn; turns that grow context by more than a defined threshold trigger review alerts
- "Send everything" retrieval strategies require explicit budget allocation
Section 2: The Compression Requirement
Before passing context to the next loop iteration, the system must compress raw API responses into strict state summaries.
Compression is not optional summarization — it is a mandatory pipeline stage. The summary must preserve task-relevant facts while discarding verbatim reproduction of source material.
A valid state summary:
Research complete. Found 3 relevant sources.
Key finding: API pricing changed March 2025, cache tokens now 50% off.
Next action: Draft comparison table.
Tokens used this session: 12,400 / 50,000 budget.An invalid context append:
[Full 8,000-token web page content pasted here]
[Full JSON response from previous tool call]
[Complete conversation history from turn 1]Section 3: The Separation of Memory
Working memory (what the model needs now) must be strictly isolated from episodic memory (the database logs of what happened).
| Memory Type | Storage | Context Window |
|---|---|---|
| Working memory | In-context, compressed state | Yes — strictly budgeted |
| Episodic memory | Database, vector store, event log | No — retrieved selectively |
Conflating the two — dumping database logs into context because "the model might need it" — is how 500-token tasks become 50,000-token tasks. Episodic memory is queried, not injected.
Previous: Article II: The Routing Mandate · Next: Article IV: The Fiscal Ceilings