See a bad LLM output. Write one rule. From now on, that error is caught, the model gets targeted feedback, and it self-corrects — on every future call, without touching your prompt.
tier: "hot", score: 25
score >= 70 ||
tier !== "hot"
Caught. Repaired. Automatically.
Same error. Always fixed.
System stronger. Add more rules.
npm install llm-contract zod
No framework. No provider lock-in. One dependency.
A lead scoring call. The model returns valid JSON — but the domain logic is wrong. llm-contract catches it and gets the model to fix it. No prompt changes.
Your prompt
"Score this lead based on their activity. Return company, score (0-100), tier (hot/warm/cold), whether they're qualified, and the recommended next action."
LLM responds
gpt-4o-mini
{
"company": "TechVentures Inc",
"score": 25,
"tier": "hot",
"qualified": true,
"signals": ["visited pricing", "downloaded whitepaper"],
"nextAction": "close"
}
Valid JSON. Correct types. Schema passes. But a score of 25 is not a "hot" lead — and you don't "close" an unqualified one.
llm-contract catches the error
Your invariant runs on the parsed response:
(d) => d.tier !== "hot" || d.score >= 70 || `tier is "hot" but score is ${d.score}`
tier is "hot" but score is 25 — invariant failed
llm-contract sends a new request automatically
Your original prompt stays the same. This repair message is appended:
"Your response had valid types but violated schema constraints:
tier is "hot" but score is 25 (minimum 70 for hot)"
The model knows exactly what's wrong. Not "try again" — the specific business rule it violated.
LLM responds
gpt-4o-mini
{
"company": "TechVentures Inc",
"score": 25,
"tier": "cold",
"qualified": false,
"signals": ["visited pricing", "downloaded whitepaper"],
"nextAction": "nurture"
}
llm-contract returns
All invariants pass. result.data returned — fully typed, validated.
Notice what happened
The invariant only checked one thing: tier vs. score. But the model didn't just fix the tier — it re-reasoned the entire response:
tier
"hot" → "cold"
qualified
true → false
nextAction
"close" → "nurture"
One invariant. Three fields corrected. The targeted feedback gave the model a reason to re-think the whole response — not just patch a single value.
Without llm-contract
With llm-contract
llm-contract classifies every failure and generates a specific repair message — out of the box, before you write a single invariant.
Model returned nothing
"You returned an empty response. Please respond with a JSON object matching the requested schema."
Model said "I'm sorry, I can't..." instead of returning data
"This is a structured data task. Your response should be a JSON object only, with no refusal or commentary."
Model returned prose, explanation, or commentary — no JSON at all
"Your response contained no JSON. Respond with ONLY a valid JSON object, no explanation, no commentary, no markdown."
JSON was cut off mid-response (token limit, network, etc.)
"Your previous response was cut off and the JSON is incomplete. Please provide a complete, shorter JSON response."
Response had JSON-like structure but couldn't be parsed
"Your response contained malformed JSON that could not be parsed. Please respond with strictly valid JSON."
Valid JSON, but doesn't match the Zod schema — wrong types, missing fields, bad enums, range violations
"Your previous response had validation errors:
- Expected 'high' | 'medium' | 'low', received 'urgent'
Please correct these issues and respond with valid JSON only."
Schema passes, types are correct — but your domain rules caught something the schema can't express. This is where the real power is.
Every invariant you add is another class of error the system catches and fixes automatically:
cross-field logic
- tier is "hot" but score is 25 (minimum 70 for hot)
business rule
- cannot block with confidence 0.3 (minimum 0.7)
consistency check
- action is "allow" but categories are non-empty — contradictory
domain constraint
- summary too long: 247 chars (max 200)
completeness rule
- must have at least one tag
There's no limit to how many invariants you define. Each one you add is another error that becomes a permanent, automatic improvement. The more you write, the stronger the system gets.
Every repair is generated automatically. The first five categories work out of the box — no configuration needed. Invariant errors include the exact violation string you defined, so the model gets domain-specific feedback.
LLMs are good — but not perfect. When a structured response comes back wrong, there's no good path from error to fix.
Prompt engineering
Slow, hard to test, and risky — a change that fixes one output can break three others.
Waiting for the next model
Maybe it fixes your issue. Maybe it doesn't. You have no control and no timeline.
Disabling features
Some teams just turn off what breaks. That's not a fix — it's giving up surface area.
No signal
You don't know which errors repeat, which fixes worked, or whether a model upgrade actually helped.
llm-contract flips this. Every error becomes a categorical fix that fires automatically. You get observability data on what fails and how often. And when you do change your prompt or upgrade your model, you have proof of whether it actually worked.
The problem isn't bad code — it's having no system for turning errors into improvements.
llm-contract wraps your existing LLM call with a deterministic contract:
The model is probabilistic. The boundary is not.
import { enforce } from "llm-contract";
import { z } from "zod";
const result = await enforce(
z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
confidence: z.number().min(0).max(1),
}),
async (attempt) => {
const res = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: attempt.prompt },
{ role: "user", content: "Analyze: 'I love this product'" },
...attempt.fixes,
],
});
return res.choices[0].message.content;
},
{
invariants: [
(data) => data.confidence > 0.5 || `confidence too low: ${data.confidence}`,
],
},
);
if (result.ok) {
result.data; // { sentiment: "positive", confidence: 0.95 } — fully typed
}
attempt.prompt — auto-generated from the schema, no hand-written format instructionsattempt.fixes — on retry, the exact violation fed back to the modelresult.data — typed, validated at runtime, no castsLLMs correct themselves reliably when you tell them exactly what went wrong. Not "return valid JSON" — the specific constraint that failed:
tier is "hot" but score is 25 (minimum 70 for hot)
cannot block with confidence 0.3 (minimum 0.7)
Expected "high" | "medium" | "low", received "urgent"
Each message becomes the repair directive. The model doesn't change. Your prompt doesn't change. The boundary enforces correctness — on every call, for every error in that class, permanently.
One invariant. Every future occurrence of that error. Automatic repair. Zero prompt changes.
Use llm-contract when LLM output feeds downstream logic and bad structure has cost:
Classification & routing
Ticket triage, intent detection, content categorization
Extraction
Forms, invoices, entities, document parsing
Scoring & normalization
Lead scoring, sentiment, risk assessment
Moderation & policy
Content review, compliance labeling
Not designed for free-form chat, creative writing, or tasks where structure doesn't matter.
7 failure categories, each with a specific repair message — built in, zero configuration.
clean, check, classify, fix, select, prompt — each usable independently.
Schema rules Zod can't express. How they work, how to write them, how they compound.
Wire onAttempt into Langfuse, OpenTelemetry, or any tracing backend.
Full function signatures, options, and return types.
Before/after comparisons for sentiment, invoices, classification, and more.