Prompt hygiene

Prompt hygiene

Every token in a system prompt is paid on every request. Treat prompts as production configuration — not draft documents — and remove everything that does not change model behavior.

This implements Article V: prompt schema standards.

Expected impact

On high-volume templates, prompt hygiene typically delivers:

ChangeTypical savings
Concise vs verbose instructions~20–25% cost reduction
Schema enforcement vs prose formattingFewer output tokens + higher reliability
Scaffolding removal50–200 tokens saved per request

A prompt that grows 200 tokens does not sound expensive. At 10 million requests per month, it is a line item.

Schema over prose

Natural language formatting instructions waste tokens and fail unpredictably. Enforce structure with JSON Schema, XML, or function signatures instead.

Prohibited:

Please respond with a JSON object containing "title" and "summary" fields.
Make sure the title is concise and the summary is no more than 100 words.

Required:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "summary_response",
      "schema": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "summary": { "type": "string", "maxLength": 500 }
        },
        "required": ["title", "summary"]
      }
    }
  }
}

Schema-enforced outputs can be validated in CI. Prose instructions cannot.

Common scaffolding waste

Audit every system prompt for tokens that add cost without adding signal:

  • "You are a helpful assistant" — politeness tax, delete it
  • "Take a deep breath and think step by step" — unmeasurable, delete it
  • // TODO: update this when we switch models — development debris, delete it
  • Repeated instructions already enforced by schema — redundant, delete it
  • Few-shot examples that could be replaced by schema constraints — evaluate removal
  • Formatting instructions duplicated across system and user messages — consolidate

Concise vs verbose prompts

Benchmarks on real workloads consistently show that shorter prompts maintain quality while reducing cost:

MetricVerbose promptConcise prompt
CostBaseline~23% lower
LatencyBaseline~8% faster
Accuracy100%100% (when task-appropriate)

The goal is not minimalism for its own sake. Remove tokens that do not change behavior. Keep tokens that define task constraints, edge cases, and quality requirements.

CI token regression

Prompt changes are code changes. Treat them accordingly:

  • Token counting in CI: measure input token count of every prompt template on every PR
  • Regression threshold: >5% increase blocks merge without FinOps sign-off (per Article V)
  • Cost projection: estimate monthly cost impact based on current request volume
  • Version history: prompt templates are versioned artifacts, not string literals scattered across files

What to measure in CI

prompt_template_v3.input_tokens = 1,847
prompt_template_v2.input_tokens = 1,620
delta = +14.0%  → BLOCK (requires review)
projected_monthly_impact = +$2,340 at 10M req/mo

Before and after example

Before (142 tokens of scaffolding):

You are a helpful, friendly assistant specialized in email classification.
Please carefully read each email and classify it into one of the categories below.
Take your time and think through your reasoning step by step.
Return your answer as a JSON object with "category" and "confidence" fields.
Categories: billing, support, sales, spam, other.

After (38 tokens, schema-enforced):

Classify the email into: billing, support, sales, spam, other.

Plus a response_format JSON Schema block (not counted against instruction tokens in the same way, and enforceable in CI).

When prompt hygiene is not enough

Prompt hygiene optimizes the static portion of your requests. If costs are still dominated by:


Tokenminning · Built by Narev