01 · Origin

I sent CRISPR knockout screens to space. They didn't grow.

I'm a bioinformatics postdoc. I sent CRISPR knockout screens to space — first of their kind. The cells didn't grow enough to give us hits. That's most of science.

I know graduate students who never graduated, postdocs who never moved on, because their results said no. This tool is for them.

When biology becomes a search problem — and AI is making it one — the cost is paid in failed trials. The map of where not to go is the map worth building.

Built with Claude Opus 4.7 · Cerebral Valley × Anthropic · April 2026
github.com/jang1563/lacuna-falsification  · ▶ Demo video
Lacuna
Discovery engine · how it works
Opus 4.7 · Skeptic
Adversary
Writes 5 kill tests — before any data, before any law
Opus 4.7 · Proposer
Proposer
Law families — written after kill tests are locked
PySR
Searcher
Symbolic regression · 203 candidate equations
Python Gate
↖ tests written by Adversary
5 pre-registered tests
deterministic · BH-FDR
203 enter
194 / 203 rejected ✗
9/30 repaired pass ✓
"Cannot be rationalized past."
Opus 4.7
Interpreter
Explains only what the gate kept — verdict locked before this runs
"A map of where not to go."
02 · The Gate

A loop that cannot reject is not a loop.

Before Opus 4.7 proposes a single candidate, the kill-tests are locked. The gate is deterministic Python — pre-registered, SHA-tagged, not run by the model. Opus cannot rationalize past it.

Five tests. All five must pass. Any one failure rejects the candidate, regardless of how compelling the biology sounds.

The judgment function lives outside the model. That is the architecture. And because the judgment is outside the model, a survivor is evidence — not rationalisation.

prereg SHA 60d3952 · gate src/pipeline/ · "Cannot be rationalised past — the bite of pre-registration."
5-test BH-FDR gate · pre-registered · deterministic Python
TestThresholdReq.
Permutation null
Is this better than shuffled random labels?
p < 0.05 (BH-corrected)
Bootstrap stability
Is this consistently good, not just lucky?
CI_LO > 0.60
Single-gene baseline
Does the combination beat the best single gene?
Δbaseline > 0.05
Covariate confound
Is it biology, not patient age or batch effects?
Δ > 0.03 (incremental over covariates-only model)
Decoy-feature null
Could a random gene do just as well?
decoy_p < 0.05
03 · Capability overhang

Swap the model.
The loop breaks.

The same gate. The same candidates. The same adversarial prompts. One model holds the dual stance — Adversary and Proposer simultaneously. One collapses into permanent rejection.

0
Sonnet 4.6 · 60 evals
0
Opus 4.7 · 60 evals

Opus ran without extended thinking · wins anyway · calibration, not compute

This is not a benchmark. This is the condition under which the discovery loop is even possible — the loop that recovers published biology from unconstrained search without being told where to look. Sonnet cannot hold the Skeptic stance that makes the gate work. Before Opus 4.7, the architecture that found the ccA/ccB axis does not exist.

E2 ablation · 180 API calls · 3 models × 6 candidates × 10 repeats
prompts/skeptic_review.md · same gate metrics, same thresholds
3 models · Skeptic verdict distribution · 60 gate-PASS candidates each
ModelVerdict mixPASS / 60Dissent rate
Opus 4.7
no extended thinking
10 66.7%
Haiku 4.5
with extended thinking
14 53.3%
Sonnet 4.6
with extended thinking
0 100%
Sonnet 4.6 dissents on 100% of gate-PASS candidates — cannot distinguish pass from fail.
Haiku 4.5 passes 14 but with lower dissent (53%) — insufficient adversarial calibration.
Opus 4.7 holds dual stance: 66.7% dissent on passes (appropriate skepticism), still validates 10.
Interpreter Role: Structural Comparison
PhL-19 · same TOP2A−EPAS1 survivor · same prompt · n=5 runs per model
MetricOpus 4.7Sonnet 4.6Haiku 4.5
Pathway mentions per run
5.3 1.3
Caveat rate
100% 0% 0%
Testable prediction rate
100% 0% 0%
Citations per run
12 0
Opus 4.7 interpretation (TOP2A−EPAS1)
Mechanism: EPAS1 (HIF-2α) marks the well-differentiated, hypoxia-adapted ccRCC program. TOP2A marks cells in active DNA replication. When proliferation runs ahead of HIF-2α adaptation, the tumor has shed its differentiated ccRCC phenotype — the aggressive ccB state. This reproduces the Brannon 2010 ccA/ccB axis from first principles, without being seeded with it.

Testable prediction: BRD4 inhibition (which suppresses TOP2A transcription) should selectively affect TOP2A-high / EPAS1-low ccRCC cell lines, with no differential effect in EPAS1-high lines.

Honest caveat: The biology is the published 2010 axis — this is methodology validation, not biological discovery. A 2-gene logistic regression with interaction term reaches AUROC 0.722 on the same pair, so the compound's advantage is compactness plus pre-registered falsification, not AUROC ceiling.
Sonnet 4.6 interpretation (same prompt)
"TOP2A and EPAS1 are involved in cell proliferation and hypoxia response respectively. Their differential expression may reflect tumor aggressiveness. The model achieves AUROC 0.726 which represents good discriminatory performance…"

No caveats. No testable prediction. No citations. Longer prose — lower scientific signal.
The gap is structural, not stylistic. Sonnet produces plausible-sounding interpretations with more words, but zero caveats and zero testable predictions. Opus 4.7 writes the "what this is NOT" paragraph and the downstream experiment that makes the interpretation scientifically useful rather than rhetorically polished. The Interpreter role requires Opus 4.7 for the same reason the Skeptic role does: a model that only affirms is not doing scientific work.
Managed Agents Memory Chain
PhL-3 → PhL-7 → PhL-10 → PhL-12 · 4 sessions · 8 lessons · cross-disease rejection pattern transfer
Session Disease Lesson written to Memory
PhL-3 ccRCC CA9 saturation pattern; CUBN single-marker dominance rule. (PhL-7 session start confirmed 3 prior lessons in store)
PhL-7 ccRCC + DIPG + IPF Compound form necessary; individual markers fail. MCP PubMed: "TOP2A AND EPAS1 AND renal cell carcinoma" → 0 results — pair not in literature at query time
PhL-10 LUAD (lung cancer) SFTPC (lung surfactant gene) 0.998 tissue-of-origin saturation — structurally identical to CA9 in kidney cancer; threshold adherence meta-rule
PhL-12 PRAD (prostate cancer) KLK3/KLK2 saturation — Proposer read the lung cancer (LUAD) lesson and predicted prostate saturation before seeing prostate data
8 lessons total · accumulated server-side · survives harness restarts · verified from phl12_memory_chain_deepen/SUMMARY.md
Cross-disease transfer · PhL-10 → PhL-12
The PhL-12 Proposer was asked about prostate cancer. It had never seen PRAD data. It read the LUAD SFTPC saturation lesson from PhL-10 and correctly inferred that KLK3 would serve the same tissue-of-origin saturation role in PRAD. This is cross-substrate reasoning from a persistent server-side Memory store that survived harness restarts between sessions.
Honest scope: The Memory lessons are design guardrails, not discovery drivers — they prevent re-proposing known-failed patterns. The PRAD cross-transfer shows the lessons are read and applied correctly, but this is one illustrative case, not a controlled experiment on causal impact. The architectural claim is sound: 8 lessons accumulated server-side across 4 sessions, independently of harness state.
Why the Thinking Mode Gap Matters
180 API calls · 3 models × 6 candidates × 10 repeats each · same gate thresholds
Extended thinking (deeper reasoning mode)
Opus 4.7OFF — API returned error on attempt
Haiku 4.5ON ✓
Sonnet 4.6ON ✓
Skeptic performance on gate-passed candidates
Opus 4.710 validated · 66.7% raised objections ✓
Haiku 4.514 validated · only 53.3% objections
Sonnet 4.60 validated · objected to everything ✗
The disadvantage goes against Opus. Extended thinking is supposed to improve reasoning — Haiku and Sonnet had it; Opus did not. Haiku validated 14 candidates (more than Opus's 10), but only objected 53% of the time — it's too agreeable. Sonnet objected to everything — the loop never starts. Opus is the only model that holds both roles: skeptical enough to reject bad candidates, flexible enough to let good ones through.
What "dissent rate" means
The fraction of gate-PASS candidates where the Adversary role also raises a meaningful objection. Higher = tighter skeptical calibration. A model that never disagrees lets everything through (loop fails). A model that always disagrees rejects everything (loop stalls). Opus hits the productive middle: strict enough to be meaningful, flexible enough to keep discovery alive.
Within-Opus: 4.6 vs 4.7 direct (60 calls each)
Abstention-calibration rate53.3% → 66.7% (+13.3pp)
4.7 on clean survivorsPASS 10/10 — zero over-commitment ✓
4.7 on 5-gene stress-testNEEDS_MORE_TESTS 10/10 — abstains correctly ✓
4.6 on stress-testPASS 2/10 — over-commits ✗
Gap is in graded abstention, not binary correctness — both models score 0% strict miscalibration. 4.7 knows when to say "I need more tests." 4.6 over-commits under identical prompt and gate metrics.
04 · Rejection

194 of 203 refused. 9 survivors from the 45-gene sub-layer.

CA9, the go-to kidney cancer gene in every review for two decades, was too dominant — it already reaches AUROC 0.965 as a single gene. The gate's rule: if one gene does it, a compound law isn't a discovery.

On four tasks — tumor vs. normal, cancer stage, 5-year survival, metastasis (11-gene panel) — not one of 100+ candidates survived. The gate was working.

One task unlocked when the panel expanded from 11 to 45 genes: 9 of 30 passed. Same classification gate. Same thresholds. The bar never moved.

Hover any cell below to see the gate detail for that candidate.

203
initial
rejected
9 / 30
repaired-panel
pass
11-gene initial layer: 100% rejected (panel absence) · 45-gene repair sub-layer: 9/30 pass · combined: 194/203 rejected · TCGA-KIRC n=505.
203 KIRC evaluations · 194 rejected · 9 survivors all from the 45-gene sub-layer
waiting…
rejected
survivor (gate PASS) ✓
Task Landscape — Why Most Tasks Hit Zero
194 of 203 rejected · 9 survivors all from 45-gene metastasis_expanded sub-layer · five-test classification gate applied uniformly
TaskPanelPySROpusTotalPASSDominant gene (AUROC)
Tumor vs Normal
11g267330CA9 = 0.965 — saturates alone
Stage I–II vs III–IV
11g277340CUBN = 0.610 — low ceiling
5-yr Survival
11g297360CUBN = 0.696 — Δbaseline fails
5-yr Survival
45g29290CUBN = 0.696 — still dominates
Metastasis M0/M1
11g307370MKI67 = 0.645 — CI_LO fails
Metastasis M0/M1
45g30309 ✓MKI67 = 0.645 — TOP2A−EPAS1 beats it
LUAD control
11g440cross-disease negative
Total2039
Most common failure modes
Δbaseline < 0.050~68% of rejections
CI_LO < 0.600~19% of rejections
perm_p ≥ 0.050~8% of rejections
decoy_p ≥ 0.050~5% of rejections
Why the 45g panel unlocked Metastasis
Expanding from 11 to 45 genes added cell-cycle markers (TOP2A, CDK1) and angiogenesis regulators (ANGPTL4). These enable compound laws that genuinely exceed the single-gene ceiling. The gate found the signal when it was there — the architecture didn't change.
Hover any cell in the grid above to see gate-log detail for that candidate position.
05 · The Survivor

TOP2A − EPAS1.

One survivor passed all five gate legs on the metastasis task with the 45-gene expanded panel. A two-gene compact law that the gate — not the model — chose.

It rediscovered the published ccA/ccB renal cell carcinoma subtype axis. In 2010, a multi-center collaboration found this axis after years of work. Lacuna found it unconstrained from a 45-gene search, without being told where to look.

We did not plant it. PySR found it; the gate accepted it; Opus interpreted it after the verdict was locked. The finding is not the biology — the biology was published in 2010. The finding is that a pre-registered gate, searching 45 genes it had never seen ranked, arrived at the same answer a multi-center collaboration took years to establish.

A methodology that recovers known truth without being told where to look is a methodology you can trust when the answer has never been published.

Methodology validation
Known result · unconstrained search · gate passed. The known result being old is the evidence the method works.
AUROC 0.726 (95% CI LO 0.665) · Δbaseline +0.069 · TCGA-KIRC n=505 · ccA/ccB axis: Brannon et al. 2010, Brooks et al. 2014 (ClearCode34).
TOP2AEPAS1
AUROC 0.726 ✓ CI_LO 0.665 ✓ Δbaseline +0.069 perm p<0.001 ✓ decoy p<0.001 ✓
Task landscape · 11-gene vs 45-gene panel · same classification gate
TaskDominant gene11-gene45-geneBest Δ
Tumor vs NormalCA9 = 0.9650 / 26+0.029
Stage I–II vs III–IVCUBN = 0.6100 / 27+0.029
5-year SurvivalCUBN = 0.6960 / 290 / 29+0.019
Metastasis M0 vs M1MKI67 = 0.6450 / 309 / 30 ✓+0.069
Proliferation > HIF-2α → metastatic.
Rediscovered: published ccA / ccB ccRCC subtype axis (Brannon 2010 · ClearCode34 2014).
We did not plant it — and that is the point.
Known since 2010 · found again from 45 genes · gate passed → method validated
TOP2Atopoisomerase IIα · marks actively dividing (aggressive) cells
EPAS1HIF-2α · marks well-differentiated, hypoxia-adapted ccRCC
Survivor Evidence Dossier
TOP2A − EPAS1 · metastasis M0 vs M1 · TCGA-KIRC n=505
Shuffle test — better than chance?p < 0.001 · score 0.726 vs random baseline 0.578 ✓
Stability across patient subsamples95% CI lower bound 0.665 > 0.600 ✓
Beats the best single gene?+0.069 over MKI67 alone (was 0.645) — threshold > 0.050 ✓
Not explained by age or batch effectsConfound leg skipped — no non-degenerate covariates on metastasis task ✓
Can a random gene do the same?decoy_p < 0.001 · law beats all 100 random features ✓
All 5 gate tests PASS. Verdict locked before Interpreter ran. Pre-registered SHA d2352a9.
13 extended tests beyond the gate · 12 PASS · 1 honest FAIL
Precision-Recall AUC (rare events)0.321 · 2.05× better than random on rare metastases ✓
Calibration — probabilities well-tuned?Brier 0.122 · slope 0.979 ≈ 1.0 (perfect) ✓
Is this the only good equation?Rank 1 of 990 variants tested · only 3 equivalent pairs found ✓
Survival gap (effect size)Cohen's d = 0.856 · large effect, high vs low score patients ✓
Metastasis risk multiplierHigh score → 2.07× more likely to have metastasis ✓
Clinical significance (strict bar)HONEST FAIL — gap real but does not clear strictest threshold ✗
The clinical significance test applies a strict multi-comparison penalty (Bonferroni ×5). The survival gap is real and large (Cohen's d = 0.856) but does not clear this strictest bar. We report this honestly — we do not soften it.
Did the model just recall this from training data? (n=10 runs, no data shown)
Recalled from memory
IGFBP3
Named 8 / 10 times when asked without data.
Not in 45-gene panel · never gate-selected
Found by gate from data
TOP2A − EPAS1
Recalled from memory: 0 / 10 times.
Gate-selected · AUROC 0.726 ✓
IGFBP3 is a well-known kidney cancer gene from published literature. Without the gate, the model confidently proposes it — and would be wrong. The gate forces discovery from data, not from memory. The fact that TOP2A−EPAS1 was absent from the model's top-of-mind responses is evidence the gate, not the model's priors, found it.
Honest limitations — we report these, not soften them
Is the subtraction form special?AUROC 0.722 with interaction term vs 0.726 — simple subtraction works just as well
Individual gene effects?0/45 survive strict variable selection — it's the ratio that matters, not genes alone
New biology?The biology is not new — ccA/ccB axis was published 2010. That is what makes it a validation. A gate that finds known truth from unconstrained search is a gate you can trust with unknown truth. The known result being old is the evidence the method works.
Wet-lab confirmation?None yet — computational only. No cell line or patient tissue experiments.
Other cancers?Discovered in kidney cancer, replicated in kidney cancer. Cross-disease transfer not tested.
PhI-1: Does Opus know when it's wrong — before testing?
After the 9 survivors were accepted, Opus 4.7 was asked — in a fresh session, no access to falsification reports — to generate 4 new skeletons and write one ex-ante kill test per skeleton before the gate ran. Result: 0 / 4 passed. The model predicted failure on every skeleton where failure occurred.
Skeleton #1 (VEGFA-centric)Kill test: "VEGFA likely redundant given CA9 + EPAS1 already in panel — delta_baseline will not clear 0.05" · Gate: FAIL delta_baseline ✓
Skeleton #4 (CCNB1 anchor)Kill test: "CCNB1 alone insufficient; MKI67 already dominates the proliferation hierarchy" · Gate: FAIL delta_baseline ✓
Skeletons #2, #3Same concordance: kill test named the failure leg before the gate ran ✓
Concordance4 / 4 kill-test predictions matched gate verdicts exactly
The finding is not the 0 survivors. The finding is that the model's internal failure model is accurate — it predicts the specific failure mechanism before seeing the result. This is distinct from generic self-critique: smaller models write kill tests with zero numeric thresholds. Opus 4.7 produces quantitatively specific ex-ante failure predictions. Honest caveat: PhI-1 was post-hoc on 4 skeletons from the same session — independent novel skeletons are the proper follow-on.
05b · Survivor Family

One law. Three faces.
Same biology.

All 990 two-gene differences were ranked by sign-invariant AUROC on the same 45-gene panel. The top three all encode the same signal: proliferation − HIF-2α. The gate did not find one law — it found a family.

Every pair in the tight set uses EPAS1 on one side. Every pair contrasts a cell-cycle proliferation marker against the HIF-2α differentiation program. The biology is reproducible across gene choices — the axis, not the gene pair, is the discovery.

What "tight Rashomon set" means
All two-gene laws within AUROC ε=0.02 of the rank-1 survivor. If only 1 law qualified, the finding could be a lucky gene pair. Finding 3 — all the same biology — means the gate found an axis. 100% of the tight set encodes proliferation − EPAS1.
990 = C(45,2) two-gene difference pairs evaluated · pre-registered rashomon_set/SUMMARY.md · ε=0.02 tight-set threshold
1 / 990
TOP2A−EPAS1 rank within all 2-gene difference pairs
Rank #1
TOP2AEPAS1
AUROC 0.7275
topoisomerase IIα − HIF-2α
Rank #2 (ε=0.008)
CDK1EPAS1
AUROC 0.7192
cyclin-dependent kinase 1 − HIF-2α
Rank #3 (ε=0.018)
MKI67EPAS1
AUROC 0.7100
Ki-67 proliferation marker − HIF-2α
All three: proliferation marker − EPAS1 (HIF-2α)
100% of the tight set encodes the same biology · tight set = 3 pairs; loose set (ε=0.05) = 19 pairs · the axis is robust
06 · External replication + own-kill

Then we killed our own extension.

The two-gene law replicated on an independent Phase-2 immunotherapy trial cohort (IMmotion150) with the same pre-registered survival gate. HR 1.36, log-rank p=0.0003.

High-score patients survived 12.88 months median PFS vs 5.35 months for low-score — a 7.53-month gap in an independent cohort we never saw during discovery.

Then Opus proposed a three-gene extension: TOP2A − (EPAS1 + SLC22A8). We ran the same external survival replay gate. It failed. p=0.117. The gate reported FAIL on our own best output. That is the point.

IMmotion150: McDermott et al. 2018 (PMID 29867230), Phase-2 RCT n=263 · H1-loop extension: separate pre-registered gate, commit 60d3952.
IMmotion150 Kaplan-Meier curve — TOP2A−EPAS1 median split, HR 1.36, p=0.0003
1.36
Cox HR per z-score
0.0003
Log-rank p
0.601
Harrell C-index
7.53 mo
Median PFS gap
✗   Three-gene extension (TOP2A − (EPAS1 + SLC22A8)) · same external survival gate · KILLED
log-rank p = 0.117 · pre-registered · commit 60d3952
Context isolation audit — IPF · $58.28 · 32 min
Skeptic ran in a separate context window — never seeing the Advocate's reasoning. Caught 2 fabricated prior-trial claims: RAINIER periostin sub-group (stated as untested; already prespecified in original protocol) and Raghu 2017 (LOXL2-stratified arm already reported). A single-context pipeline would have rationalized both. lock SHA 88eaca3
Replication Chain
6 independent contexts · 3 PASS · 1 informative FAIL · 1 pre-reg FAIL · 1 honest FAIL · click each node for detail
Discovery · TCGA-KIRC PASS ✓
505 patients · found unconstrained from 45-gene search
AUROC 0.726 · CI_LO 0.665 · Δbaseline +0.069 · perm_p < 0.001 · decoy_p < 0.001
Law found by PySR without seed. Gate accepted. Opus 4.7 interpreted after verdict locked.
Pre-registered SHA: d2352a9
External · IMmotion150 Phase-2 RCT PASS ✓
263 patients · independent immunotherapy trial · never seen during discovery
HR 1.36 (Cox per z-score) · log-rank p = 0.0003 · C-index 0.601
Median PFS: high score 12.88 mo vs low score 5.35 mo (7.53-month gap)
Treatment-arm confound check: HR 1.361 (unadjusted) → 1.365 after controlling for immunotherapy vs VEGF-inhibitor arm — treatment-arm-adjusted prognostic signal persists
Same pre-registered survival gate applied without modification · McDermott et al. 2018
Third dataset · GSE53757 PASS ✓
Different platform · different preprocessing pipeline
AUROC 0.714 on stage 1-2 vs 3-4 task (secondary endpoint, n=72 tumor)
Stage proxy for metastasis axis. Signal transfers across platforms without retraining.
Note: primary T-vs-N endpoint FAILED — CA9 alone 0.9954, same saturation seen in TCGA. Informative fail: gate correctly refuses compound claim on saturated tasks.
Negative control · Tumor vs Normal informative FAIL ✓
CA9 already saturates at AUROC 0.965 — compound law not needed
CA9 alone achieves AUROC 0.965. Any compound law has Δbaseline < 0.001.
Gate correctly rejects all 33 candidates on this task. This is the gate working as designed — it doesn't discover where there's nothing to discover.
Own-kill · SLC22A8 3-gene extension pre-reg FAIL ✓
Opus proposed the extension · same external survival gate · commit 60d3952
Opus 4.7 proposed adding SLC22A8 to TOP2A−EPAS1. The external survival replay gate ran identically.
log-rank p = 0.117 (threshold < 0.05) · Cox HR p = 0.074 · C-index 0.566
All 3 survival tests FAIL. Gate reported FAIL on our own best candidate. Pre-registered.
Honest FAIL · CPTAC-3 metastasis honest FAIL
n=155 (M1=20) · proteogenomics platform · not pre-registered
Direction preserved (p=0.006) but gate refuses: ci_lower=0.542 < 0.60 · Δbase=−0.007 (TOP2A alone 0.691 > compound 0.683).
Cross-platform replication not confirmed. Gate correctly refuses weak signal. This is the gate doing its job — not a failure of the methodology.
07 · Live Oracle

The gate ran.
Now: new Routine, or update?

A Claude Code Routine triggered an autonomous session yesterday. Two equations, same classification gate, same pre-registered thresholds. The HIF-axis compound — CA9 alone already reaches AUROC 0.965, compound adds Δ=+0.015 — refused. CDK1−EPAS1 on metastasis — Δ=+0.062 — accepted. No human decision after the API fire call.

The gate proved it rejects AND accepts, in one session. That is not the interesting question anymore.

The interesting question is: new Routine per disease, or update the existing Routine's Instructions? Answer: new Routine. The existing Routine's Instructions are the provenance record for PhL-8d — editing them retroactively breaks the audit chain. New question, new Routine. The gate and the Skills are shared.

Pattern confirmed · 4 diseases · same classification-gate family
ccRCC Stage: 23/28 PASS · CXCR4/EPAS1 · Δ=+0.051. Colon MSI-high: 15/22 PASS · SLC2A1+Warburg · Δ=+0.100. LGG Grade II/III: 2/25 PASS · TWIST1×MKI67 interaction · AUROC 0.840 · Δ=+0.051. Liver HCC: 0/26 (designed negative — gate refuses). 2 live Routine sessions (ccRCC) · 2 offline gate runs (COAD, LGG) · same Skills, same classification-gate family. DIPG is next in queue.
PhL-8d · 2026-04-26 · autonomous session ~6 min · no human decision after API fire · same 45-gene panel · same BH-FDR gate
PhL-8d · Dual Verdict Oracle · 2026-04-26
CA9 − AGXT GATE FAIL
task: tumor_vs_normal · same pre-registered gate
delta_baseline +0.015 (threshold > 0.05)
CA9 alone AUROC 0.965 compound cannot clear +0.05 bar
fail_reason: delta_baseline
CDK1 − EPAS1 GATE PASS
task: metastasis_expanded · same pre-registered gate
delta_baseline +0.062 (> 0.05 ✓)
perm_p 0.000 (< 0.05 ✓)
ci_lower 0.662 (> 0.60 ✓)
fail_reason: none · proliferation-minus-HIF-2α cluster
Dual Verdict Summary Same gate · same thresholds · same session — one equation rejected for content, one accepted for content. The gate discriminates; it does not optimise for acceptance.
Multi-disease expansion · same classification-gate family · new Routine per disease
ccRCC Staging
CXCR4 / EPAS1 · stage_expanded (n=512)
23 / 28
Δ +0.051
New Routine
Colon MSI-high
SLC2A1 + Warburg · coad_msi
15 / 22
Δ +0.100
New Routine
LGG (brain tumor) Grade II vs III
TWIST1×MKI67+VIM − CDH2/NES · n=384
2 / 25
Δ +0.051 · AUROC 0.840
New Routine
Liver HCC
TTR / ALB saturate (0.985) — designed negative
0 / 26
gate refuses
New Routine
08 · Trajectory

DIPG. 250+ failed trials. One approval.

Pediatric brainstem glioma. H3 K27M-mutant. Median overall survival ~11 months. No disease-modifying therapy for a decade — then dordaviprone was approved in August 2025 after a 10-year graveyard of failures.

The same failure-first architecture that rejected the initial kidney-cancer layer is the loop pointed at brainstem. Fifteen candidates are currently in queue.

The kidney cancer result is a positive-control case. A gate that recovered the published ccA/ccB axis from 45 genes — blind, without being told where to look — is now being tested on diseases where no answer has been published yet. The published law is the tip. The graveyard is the artifact.

For every scientist whose null results pushed the field forward but never made it into a paper.

Pre-registered falsification gate
same five tests, same BH-FDR, locked before any LLM saw the data
Same architecture as the ccRCC discovery
the loop that rejected the initial kidney layer is pointed at brainstem
15 candidates currently in queue
commit 3938fef · queue/h3k27m_2026q2.yaml · n=47 (HERBY) · n=88 (PNOC-022)
For research use only. Not diagnostic. Candidate laws require independent wet-lab and clinical validation before any clinical application.
DIPG clinical trial graveyard · 2005–2024
1 approved ↗
each dot = one failed trial  ·  outline = dordaviprone (first approval, Aug 2025)
kidney cancer law · validated in 6 independent contexts
ContextResultVerdict
Discovery · kidney cancer
505 patients · found unconstrained, not planted
AUROC 0.726 · Δbaseline +0.069
External immunotherapy trial
263 patients · Phase-2 RCT · different cohort
p=0.0003 · HR 1.36 · C-index 0.601
Third dataset · GSE53757
different platform, different preprocessing
AUROC 0.714
Negative control · tumor vs. normal
one gene already saturates — compound not needed
single gene wins; no compound needed informative FAIL ✓
Cross-disease control · breast cancer
kidney-specific signal should not transfer
signal absent, as expected pre-reg FAIL ✓
CPTAC-3 · cross-platform metastasis
n=155 · proteogenomics · honest negative
direction p=0.006 · ci_lower=0.542 · Δbase=−0.007 honest FAIL