Articles
Article III: Context Window Sovereignty

Article III: Context Window Sovereignty

The Law of State

Infinite context is a provider marketing feature, not an architectural permission. Unchecked context growth is the primary mechanism by which agentic workflows bankrupt startups.

Section 1: State Over History

Agentic workflows are forbidden from blindly appending raw tool-call outputs to the context window.

Each agent turn does not need the full transcript of everything that ever happened. It needs the current state of the task. Appending raw API responses, full web page scrapes, or complete file contents to context is state pollution.

Enforcement:

  • Tool outputs pass through a truncation or summarization gate before entering context
  • Context size is monitored per turn; turns that grow context by more than a defined threshold trigger review alerts
  • "Send everything" retrieval strategies require explicit budget allocation

Section 2: The Compression Requirement

Before passing context to the next loop iteration, the system must compress raw API responses into strict state summaries.

Compression is not optional summarization — it is a mandatory pipeline stage. The summary must preserve task-relevant facts while discarding verbatim reproduction of source material.

A valid state summary:

Research complete. Found 3 relevant sources.
Key finding: API pricing changed March 2025, cache tokens now 50% off.
Next action: Draft comparison table.
Tokens used this session: 12,400 / 50,000 budget.

An invalid context append:

[Full 8,000-token web page content pasted here]
[Full JSON response from previous tool call]
[Complete conversation history from turn 1]

Section 3: The Separation of Memory

Working memory (what the model needs now) must be strictly isolated from episodic memory (the database logs of what happened).

Memory TypeStorageContext Window
Working memoryIn-context, compressed stateYes — strictly budgeted
Episodic memoryDatabase, vector store, event logNo — retrieved selectively

Conflating the two — dumping database logs into context because "the model might need it" — is how 500-token tasks become 50,000-token tasks. Episodic memory is queried, not injected.


Previous: Article II: The Routing Mandate · Next: Article IV: The Fiscal Ceilings


Tokenminning · Built by Narev