Articles
Article II: The Routing Mandate

Article II: The Routing Mandate

The Law of Model Selection

Frontier models are a privileged resource, not a default endpoint. Treating GPT-4-class models as model: "default" is the architectural equivalent of using a cargo plane for grocery delivery.

Section 1: The Baseline Assumption

All tasks default to the smallest capable model.

The routing layer — not the application developer — owns the initial model selection. Application code requests a capability (e.g., classify-intent, generate-summary), and the router maps it to the cheapest model that meets the SLA.

Prohibited patterns:

  • Hardcoding gpt-4 in application code without a routing abstraction
  • Using frontier models for classification, extraction, or formatting tasks
  • "We'll switch to a cheaper model later" as a launch strategy

Section 2: The Proof of Complexity

Routing to a frontier model requires programmatic justification — a classifier flag, complexity score, or escalation signal indicating that cheaper models have been attempted or assessed as insufficient.

Acceptable justification signals:

  • A complexity classifier returned score > threshold
  • A cheaper model attempt failed quality checks (with the failure logged)
  • The user explicitly selected a "high quality" tier (with corresponding billing)

Unacceptable justification:

  • Developer preference
  • "The prompt is long"
  • Absence of a routing layer entirely

Every frontier model call must be auditable. If you cannot produce the justification log for a given inference, the call should not have happened.

Section 3: The Fallback Protocol

If a fast or cheap model fails, the system cascades upward — but never defaults upward.

Escalation is a response to failure, not a starting position. The cascade must be:

  1. Attempt cheapest capable model
  2. On quality failure, log the failure and escalate one tier
  3. Repeat until success or hard ceiling reached
  4. If ceiling reached, degrade gracefully (see Article IV) — do not silently burn budget on frontier models

Defaulting upward — starting with the most expensive model "to be safe" — is a constitutional violation regardless of outcome quality.


Previous: Article I: Immutable Metering · Next: Article III: Context Window Sovereignty


Tokenminning · Built by Narev