When output is easy, ensuring quality becomes the hard work.

Product development in a box. Engineered by AI, Imagined by Humans.

Drift.

The gap between what you wanted and what you got.

AI makes output easy. But most people use AI like a chatbox: dump context in, don't set clear evaluation criteria, accept the first result. No iteration, no review.

“I know what I want the AI to do, but I can't get it to actually do it consistently.”

You end up debugging the AI's output instead of building your product. For a solo founder, that time tax is fatal.

Not an AI tool. A quality layer.

Think of it as CI/CD for AI-generated content. You wouldn't ship code without tests. You don't ship outputs without passing the rubric.

Cold Anvil wraps your generation in a rigorous quality pipeline: structural gates catch the obvious, model-as-judge review catches the subtle, and automated rewriting fixes what can be fixed. You define what “good” looks like. The pipeline enforces it.

How It Works

Shape Inputs

The crazy idea that might just work. The business plan you wrote on a coffee shop napkin. The unicorn you and your mates came up with in the pub.

Frame What Good Looks Like

What smells right? What feels right? The thing that AI can help you get towards, but can't define on its own. How will we measure it?

Refine Outputs

Make sure AI stays on the right track. Don't make catching AI drift a human job. Have AI help you keep it on track, based on what you think good looks like.

Built for builders.

“I don't have a team. I don't have months. I have a deadline of weeks and the embryo of an idea.”

Cold Anvil is for the solo founder, the indie hacker, the small team that needs to ship without hiring an agency. You bring the vision. We bring the engineering.

Force multiplier. Not sunk cost.

What you get

Config Packs

Define your prompts, rubrics, gates, and batches. Portable, declarative, versioned. It's not a prompt — it's a quality spec.

Quality Pipeline

Three-phase review: distribute, evaluate against rubrics, rewrite what fails. Up to 9 attempts before anything reaches you.

Structural Gates

Deterministic checks that run instantly. No AI-speak, no placeholder text, required sections present, word counts met. The obvious stuff, automated.

Model-as-Judge Review

A separate model critiques every output against your rubrics. Adversarial by design — it finds problems before scoring.

Iteration Loops

Tell us what's not right. We update the rubrics, re-run the flagged components, and show you what changed. Your feedback drives quality.

Full Observability

See every score, every gate result, every reviewer comment. Know exactly why something passed or failed. No black boxes.

Quality is the independent variable.

We never optimise for speed or cost at the expense of quality. That's the deal.

Ready to build something?

Start for free. Shape your idea. See what comes out. Upgrade when you're ready to forge at scale.

Start Building — No Card Required