I'm a bioinformatics postdoc. I sent CRISPR knockout screens to space — first of their kind. The cells didn't grow enough to give us hits. That's most of science.
I know graduate students who never graduated, postdocs who never moved on, because their results said no. This tool is for them.
When biology becomes a search problem — and AI is making it one — the cost is paid in failed trials. The map of where not to go is the map worth building.
Before Opus 4.7 proposes a single candidate, the kill-tests are locked. The gate is deterministic Python — pre-registered, SHA-tagged, not run by the model. Opus cannot rationalize past it.
Five tests. All five must pass. Any one failure rejects the candidate, regardless of how compelling the biology sounds.
The judgment function lives outside the model. That is the architecture. And because the judgment is outside the model, a survivor is evidence — not rationalisation.
| Test | Threshold | Req. |
|---|---|---|
Permutation null Is this better than shuffled random labels? |
p < 0.05 (BH-corrected) | ✓ |
Bootstrap stability Is this consistently good, not just lucky? |
CI_LO > 0.60 | ✓ |
Single-gene baseline Does the combination beat the best single gene? |
Δbaseline > 0.05 | ✓ |
Covariate confound Is it biology, not patient age or batch effects? |
Δ > 0.03 (incremental over covariates-only model) | ✓ |
Decoy-feature null Could a random gene do just as well? |
decoy_p < 0.05 | ✓ |
The same gate. The same candidates. The same adversarial prompts. One model holds the dual stance — Adversary and Proposer simultaneously. One collapses into permanent rejection.
Opus ran without extended thinking · wins anyway · calibration, not compute
This is not a benchmark. This is the condition under which the discovery loop is even possible — the loop that recovers published biology from unconstrained search without being told where to look. Sonnet cannot hold the Skeptic stance that makes the gate work. Before Opus 4.7, the architecture that found the ccA/ccB axis does not exist.
| Metric | Opus 4.7 | Sonnet 4.6 | Haiku 4.5 |
|---|---|---|---|
Pathway mentions per run |
5.3 | 1.3 | — |
Caveat rate |
100% | 0% | 0% |
Testable prediction rate |
100% | 0% | 0% |
Citations per run |
12 | 0 | — |
| Session | Disease | Lesson written to Memory |
|---|---|---|
| PhL-3 | ccRCC | CA9 saturation pattern; CUBN single-marker dominance rule. (PhL-7 session start confirmed 3 prior lessons in store) |
| PhL-7 | ccRCC + DIPG + IPF | Compound form necessary; individual markers fail. MCP PubMed: "TOP2A AND EPAS1 AND renal cell carcinoma" → 0 results — pair not in literature at query time |
| PhL-10 | LUAD (lung cancer) | SFTPC (lung surfactant gene) 0.998 tissue-of-origin saturation — structurally identical to CA9 in kidney cancer; threshold adherence meta-rule |
| PhL-12 | PRAD (prostate cancer) | KLK3/KLK2 saturation — Proposer read the lung cancer (LUAD) lesson and predicted prostate saturation before seeing prostate data |
| 8 lessons total · accumulated server-side · survives harness restarts · verified from phl12_memory_chain_deepen/SUMMARY.md | ||
CA9, the go-to kidney cancer gene in every review for two decades, was too dominant — it already reaches AUROC 0.965 as a single gene. The gate's rule: if one gene does it, a compound law isn't a discovery.
On four tasks — tumor vs. normal, cancer stage, 5-year survival, metastasis (11-gene panel) — not one of 100+ candidates survived. The gate was working.
One task unlocked when the panel expanded from 11 to 45 genes: 9 of 30 passed. Same classification gate. Same thresholds. The bar never moved.
Hover any cell below to see the gate detail for that candidate.
| Task | Panel | PySR | Opus | Total | PASS | Dominant gene (AUROC) |
|---|---|---|---|---|---|---|
Tumor vs Normal | 11g | 26 | 7 | 33 | 0 | CA9 = 0.965 — saturates alone |
Stage I–II vs III–IV | 11g | 27 | 7 | 34 | 0 | CUBN = 0.610 — low ceiling |
5-yr Survival | 11g | 29 | 7 | 36 | 0 | CUBN = 0.696 — Δbaseline fails |
5-yr Survival | 45g | 29 | — | 29 | 0 | CUBN = 0.696 — still dominates |
Metastasis M0/M1 | 11g | 30 | 7 | 37 | 0 | MKI67 = 0.645 — CI_LO fails |
Metastasis M0/M1 | 45g | 30 | — | 30 | 9 ✓ | MKI67 = 0.645 — TOP2A−EPAS1 beats it |
LUAD control | 11g | — | 4 | 4 | 0 | cross-disease negative |
| Total | 203 | 9 | ||||
One survivor passed all five gate legs on the metastasis task with the 45-gene expanded panel. A two-gene compact law that the gate — not the model — chose.
It rediscovered the published ccA/ccB renal cell carcinoma subtype axis. In 2010, a multi-center collaboration found this axis after years of work. Lacuna found it unconstrained from a 45-gene search, without being told where to look.
We did not plant it. PySR found it; the gate accepted it; Opus interpreted it after the verdict was locked. The finding is not the biology — the biology was published in 2010. The finding is that a pre-registered gate, searching 45 genes it had never seen ranked, arrived at the same answer a multi-center collaboration took years to establish.
A methodology that recovers known truth without being told where to look is a methodology you can trust when the answer has never been published.
| Task | Dominant gene | 11-gene | 45-gene | Best Δ |
|---|---|---|---|---|
| Tumor vs Normal | CA9 = 0.965 | 0 / 26 | — | +0.029 |
| Stage I–II vs III–IV | CUBN = 0.610 | 0 / 27 | — | +0.029 |
| 5-year Survival | CUBN = 0.696 | 0 / 29 | 0 / 29 | +0.019 |
| Metastasis M0 vs M1 | MKI67 = 0.645 | 0 / 30 | 9 / 30 ✓ | +0.069 |
All 990 two-gene differences were ranked by sign-invariant AUROC on the same 45-gene panel. The top three all encode the same signal: proliferation − HIF-2α. The gate did not find one law — it found a family.
Every pair in the tight set uses EPAS1 on one side. Every pair contrasts a cell-cycle proliferation marker against the HIF-2α differentiation program. The biology is reproducible across gene choices — the axis, not the gene pair, is the discovery.
The two-gene law replicated on an independent Phase-2 immunotherapy trial cohort (IMmotion150) with the same pre-registered survival gate. HR 1.36, log-rank p=0.0003.
High-score patients survived 12.88 months median PFS vs 5.35 months for low-score — a 7.53-month gap in an independent cohort we never saw during discovery.
Then Opus proposed a three-gene extension: TOP2A − (EPAS1 + SLC22A8). We ran the same external survival replay gate. It failed. p=0.117. The gate reported FAIL on our own best output. That is the point.
A Claude Code Routine triggered an autonomous session yesterday. Two equations, same classification gate, same pre-registered thresholds. The HIF-axis compound — CA9 alone already reaches AUROC 0.965, compound adds Δ=+0.015 — refused. CDK1−EPAS1 on metastasis — Δ=+0.062 — accepted. No human decision after the API fire call.
The gate proved it rejects AND accepts, in one session. That is not the interesting question anymore.
The interesting question is: new Routine per disease, or update the existing Routine's Instructions? Answer: new Routine. The existing Routine's Instructions are the provenance record for PhL-8d — editing them retroactively breaks the audit chain. New question, new Routine. The gate and the Skills are shared.
Pediatric brainstem glioma. H3 K27M-mutant. Median overall survival ~11 months. No disease-modifying therapy for a decade — then dordaviprone was approved in August 2025 after a 10-year graveyard of failures.
The same failure-first architecture that rejected the initial kidney-cancer layer is the loop pointed at brainstem. Fifteen candidates are currently in queue.
The kidney cancer result is a positive-control case. A gate that recovered the published ccA/ccB axis from 45 genes — blind, without being told where to look — is now being tested on diseases where no answer has been published yet. The published law is the tip. The graveyard is the artifact.
For every scientist whose null results pushed the field forward but never made it into a paper.
| Context | Result | Verdict |
|---|---|---|
Discovery · kidney cancer 505 patients · found unconstrained, not planted |
AUROC 0.726 · Δbaseline +0.069 | ✓ |
External immunotherapy trial 263 patients · Phase-2 RCT · different cohort |
p=0.0003 · HR 1.36 · C-index 0.601 | ✓ |
Third dataset · GSE53757 different platform, different preprocessing |
AUROC 0.714 | ✓ |
Negative control · tumor vs. normal one gene already saturates — compound not needed |
single gene wins; no compound needed | informative FAIL ✓ |
Cross-disease control · breast cancer kidney-specific signal should not transfer |
signal absent, as expected | pre-reg FAIL ✓ |
CPTAC-3 · cross-platform metastasis n=155 · proteogenomics · honest negative |
direction p=0.006 · ci_lower=0.542 · Δbase=−0.007 | honest FAIL |