llm-contract — Turn LLM errors into permanent improvements

See it work

A lead scoring call. The model returns valid JSON — but the domain logic is wrong. llm-contract catches it and gets the model to fix it. No prompt changes.

Your prompt

"Score this lead based on their activity. Return company, score (0-100), tier (hot/warm/cold), whether they're qualified, and the recommended next action."

LLM responds

gpt-4o-mini

{
  "company":    "TechVentures Inc",
  "score":      25,
  "tier":       "hot",
  "qualified":  true,
  "signals":    ["visited pricing", "downloaded whitepaper"],
  "nextAction": "close"
}

Valid JSON. Correct types. Schema passes. But a score of 25 is not a "hot" lead — and you don't "close" an unqualified one.

llm-contract catches the error

Your invariant runs on the parsed response:

(d) => d.tier !== "hot" || d.score >= 70 || `tier is "hot" but score is ${d.score}`

tier is "hot" but score is 25 — invariant failed

llm-contract sends a new request automatically

Your original prompt stays the same. This repair message is appended:

"Your response had valid types but violated schema constraints:
tier is "hot" but score is 25 (minimum 70 for hot)"

The model knows exactly what's wrong. Not "try again" — the specific business rule it violated.

LLM responds

gpt-4o-mini

{
  "company":    "TechVentures Inc",
  "score":      25,
  "tier":       "cold",
  "qualified":  false,
  "signals":    ["visited pricing", "downloaded whitepaper"],
  "nextAction": "nurture"
}

llm-contract returns

All invariants pass. result.data returned — fully typed, validated.

Notice what happened

The invariant only checked one thing: tier vs. score. But the model didn't just fix the tier — it re-reasoned the entire response:

tier

"hot" → "cold"

qualified

true → false

nextAction

"close" → "nurture"

One invariant. Three fields corrected. The targeted feedback gave the model a reason to re-think the whole response — not just patch a single value.

Without vs. with

Without llm-contract

× Error reaches production, discovered downstream
× You tweak the prompt — might fix this, might break that
× Same error reappears after model upgrade
× No data on what failed or how often
× "Try again" retry — same prompt, same odds

With llm-contract

✓ Error caught at the boundary, never reaches downstream
✓ Exact violation fed back — prompt stays untouched
✓ Invariant catches it again, automatically, forever
✓ Every attempt logged with category, issues, duration
✓ Targeted repair — model knows what to fix

Built-in targeted repairs

llm-contract classifies every failure and generates a specific repair message — out of the box, before you write a single invariant.

EMPTY_RESPONSE

Model returned nothing

"You returned an empty response. Please respond with a JSON object matching the requested schema."

REFUSAL

Model said "I'm sorry, I can't..." instead of returning data

"This is a structured data task. Your response should be a JSON object only, with no refusal or commentary."

NO_JSON

Model returned prose, explanation, or commentary — no JSON at all

"Your response contained no JSON. Respond with ONLY a valid JSON object, no explanation, no commentary, no markdown."

TRUNCATED

JSON was cut off mid-response (token limit, network, etc.)

"Your previous response was cut off and the JSON is incomplete. Please provide a complete, shorter JSON response."

PARSE_ERROR

Response had JSON-like structure but couldn't be parsed

"Your response contained malformed JSON that could not be parsed. Please respond with strictly valid JSON."

VALIDATION_ERROR

Valid JSON, but doesn't match the Zod schema — wrong types, missing fields, bad enums, range violations

"Your previous response had validation errors:
- Expected 'high' | 'medium' | 'low', received 'urgent'
Please correct these issues and respond with valid JSON only."

INVARIANT_ERROR unlimited

Schema passes, types are correct — but your domain rules caught something the schema can't express. This is where the real power is.

Every invariant you add is another class of error the system catches and fixes automatically:

cross-field logic

- tier is "hot" but score is 25 (minimum 70 for hot)

business rule

- cannot block with confidence 0.3 (minimum 0.7)

consistency check

- action is "allow" but categories are non-empty — contradictory

domain constraint

- summary too long: 247 chars (max 200)

completeness rule

- must have at least one tag

There's no limit to how many invariants you define. Each one you add is another error that becomes a permanent, automatic improvement. The more you write, the stronger the system gets.

Every repair is generated automatically. The first five categories work out of the box — no configuration needed. Invariant errors include the exact violation string you defined, so the model gets domain-specific feedback.

The problem

LLMs are good — but not perfect. When a structured response comes back wrong, there's no good path from error to fix.

Prompt engineering

Slow, hard to test, and risky — a change that fixes one output can break three others.

Waiting for the next model

Maybe it fixes your issue. Maybe it doesn't. You have no control and no timeline.

Disabling features

Some teams just turn off what breaks. That's not a fix — it's giving up surface area.

No signal

You don't know which errors repeat, which fixes worked, or whether a model upgrade actually helped.

llm-contract flips this. Every error becomes a categorical fix that fires automatically. You get observability data on what fails and how often. And when you do change your prompt or upgrade your model, you have proof of whether it actually worked.

The problem isn't bad code — it's having no system for turning errors into improvements.

What this does

llm-contract wraps your existing LLM call with a deterministic contract:

prompt → your LLM call → clean → check ↓ fail → fix → retry with targeted repair

1. Generate structural instructions from your schema
2. Call your model (any provider, any SDK)
3. Clean raw output into JSON
4. Validate against the schema (types, fields, enums, invariants)
5. On failure, feed the exact violation back to the model
6. Retry with targeted repair (bounded)

The model is probabilistic. The boundary is not.

Quick example

import { enforce } from "llm-contract";
import { z } from "zod";

const result = await enforce(
  z.object({
    sentiment: z.enum(["positive", "negative", "neutral"]),
    confidence: z.number().min(0).max(1),
  }),
  async (attempt) => {
    const res = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [
        { role: "system", content: attempt.prompt },
        { role: "user", content: "Analyze: 'I love this product'" },
        ...attempt.fixes,
      ],
    });
    return res.choices[0].message.content;
  },
  {
    invariants: [
      (data) => data.confidence > 0.5 || `confidence too low: ${data.confidence}`,
    ],
  },
);

if (result.ok) {
  result.data; // { sentiment: "positive", confidence: 0.95 } — fully typed
}

attempt.prompt — auto-generated from the schema, no hand-written format instructions
attempt.fixes — on retry, the exact violation fed back to the model
result.data — typed, validated at runtime, no casts
Markdown fences, prose wrapping, string-typed numbers — cleaned automatically

Why this works

LLMs correct themselves reliably when you tell them exactly what went wrong. Not "return valid JSON" — the specific constraint that failed:

INVARIANT tier is "hot" but score is 25 (minimum 70 for hot)

INVARIANT cannot block with confidence 0.3 (minimum 0.7)

VALIDATION Expected "high" | "medium" | "low", received "urgent"

Each message becomes the repair directive. The model doesn't change. Your prompt doesn't change. The boundary enforces correctness — on every call, for every error in that class, permanently.

One invariant. Every future occurrence of that error. Automatic repair. Zero prompt changes.

When to use this

Use llm-contract when LLM output feeds downstream logic and bad structure has cost:

Classification & routing

Ticket triage, intent detection, content categorization

Extraction

Forms, invoices, entities, document parsing

Scoring & normalization

Lead scoring, sentiment, risk assessment

Moderation & policy

Content review, compliance labeling

Not designed for free-form chat, creative writing, or tasks where structure doesn't matter.

Turn one error into a permanent
improvement — automatically.

See it work

Without vs. with

Built-in targeted repairs

The problem

What this does

Quick example

Why this works

When to use this

Explore

Targeted Repairs

Primitives

Invariants

Observability

API Reference

Examples

Turn one error into a permanentimprovement — automatically.

See it work

Without vs. with

Built-in targeted repairs

The problem

What this does

Quick example

Why this works

When to use this

Explore

Targeted Repairs

Primitives

Invariants

Observability

API Reference

Examples

Turn one error into a permanent
improvement — automatically.