Approach

We ship intelligence you can check.

Not a benchmark to trust — a result you can re-run. Everything we build is graded by an objective oracle, not another model’s opinion.

What we deliver

Systems where a verifier — not a model’s say-so — decides what ships.

PNRD#1 proves the principle on mathematics: an open model lifted to 47% on miniF2F, every proof checked by the Lean kernel. We build the same verifier-decidesmechanism into everything else — including multi-agent generation, where conflicts are resolved against ground truth (does the code actually support the claim?) rather than by an arbiter model’s opinion. One thread: replace judgment with verification.

Checked, not claimed.

Most AI results ask you to trust a number. Ours hand you the receipt. When the gate is a mechanical check rather than a model's opinion, correctness stops being a promise and becomes something you can re-run.

A harness is only as good as its oracle.

With a sound oracle, test-time compute buys large, trustworthy gains. With a weak or absent one, the very same machinery is neutral — or worse. We measured both, and we publish both.

✓ Sound oracle

Lean kernel · miniF2F
24.6% → 47.1% (+22.5 pts)

✗ Weak / no oracle

self-graded / hidden tests
≈ 0 — neutral to negative

Blocked beats faked.

An honest “it didn’t pass” is worth more than a confident wrong answer. We report KILL verdicts next to the wins, because a lab that only shows you its hits isn’t showing you anything.

The proof of the approach is PNRD#1.

Read PNRD#1 →View the raw data