0 falsified
hypotheses logged
Press T to toggle · scroll to advance · narration synced to scene

Lacuna

falsification-first biological law discovery

github.com/jang1563/lacuna-falsification

For every scientist whose null results pushed the field forward
but never made it into a paper.

When biology becomes a search problem
and AI is making it one —
the cost is paid in failed trials.

The map of where not to go
is the map worth building.

To trust that map: a gate that finds truth it was never told.

Opus proposes. Python gates.
Routine fires. What's next?

Claude Code Routine API-triggered · live ↗
···→
MA · Session A Proposer Opus 4.7
local Python PySR symbolic search
pre-registered · Python 5-Test Gate no LLM decides pass / fail
MA · Session B Skeptic Opus 4.7
MA · Session C Interpreter Opus 4.7
↓ 224 FAIL → rejection log committed ↑ 9 PASS → Skeptic review
3 context-isolated Opus sessions
Deterministic Python gate
New Routine per disease · same Skills · same classification-gate family · 4 diseases confirmed
FAIL CA9−AGXT Δ=0.015 · PASS CDK1−EPAS1 Δ=0.062 · 23/28 Stage CXCR4/EPAS1 · 15/22 Colon MSI · 2/25 LGG Grade II/III · AUROC 0.840 · static evidence in results/live_evidence/
Managed Agents Memory · rejection lessons accumulate across sessions · persists server-side
Session 1 Kidney · one gene dominated
Session 2 Kidney · compound collapsed
Session 3 Lung · surfactant gene 99.8%
Session 4 ✓ Prostate · predicted PSA dominance
Prostate cancer proposer cited the lung cancer single-gene saturation lesson — without seeing prostate data — and correctly predicted the dominant PSA-gene pattern. 8 lessons stored across 4 sessions.

Rejection

The gate refused.

The TCGA-KIRC gate rejected 194 of 203 candidate evaluations. The 9 survivors all came from the 45-gene metastasis_expanded sub-layer, after the loop diagnosed panel absence on the original 11-gene layer. CA9 alone saturates tumor-normal AUROC at 0.965, so CA9-dominated compounds are refused.

224
refused
9
repair pass
"Even the law Opus most wanted to keep — refused."
203 KIRC evaluations · 194 rejected · 9 survivors all from 45-gene sub-layer
rejected
survivor (gate PASS)

Survivor

Expand the panel.
Same gate.

45-gene HIF / Warburg / proliferation panel. Same pre-registered thresholds. Nine laws pass on metastasis. The simplest is a law that was already published — in 2010:

TOP2AEPAS1
cell-division gene  −  oxygen-response gene
AUROC 0.726 Δbaseline +0.069 CI lower 0.665 perm p < 0.001
When cell division outruns oxygen response — it predicts spread.
The published kidney cancer molecular subtype axis.
We did not plant it.
Known truth · unconstrained search · method validated

A gate that recovers what was already published can be trusted for what has not been published yet.

Task landscape · 11-gene vs 45-gene panel
Task Dominant gene 11-gene 45-gene Best Δ
Tumor vs Normal CA9 = 0.965 0 / 26 +0.029
Stage I-II vs III-IV CUBN = 0.610 0 / 27 +0.029
5-yr Survival CUBN = 0.696 0 / 29 0 / 29 +0.019
Metastasis M0 vs M1 MKI67 = 0.645 0 / 30 9 / 30 ✓ +0.069

Same 5-test gate · same BH-FDR · same Python code · four biologically distinct tasks.

Near-equivalent formulas (within 2% accuracy) · 3 of 990 two-gene pairs · all: proliferation − oxygen-response gene
#1 / 990
TOP2AEPAS1
0.7275
cell-division enzyme − oxygen-response factor
#2 / 990
CDK1EPAS1
0.7192
cell-cycle kinase − oxygen-response factor
#3 / 990
MKI67EPAS1
0.7100
proliferation marker − oxygen-response factor

External Replay

Cross-cohort survival.
Pre-registered gate.
log-rank · Cox · C-index all pass.

IMmotion150 — independent Phase-2 immunotherapy trial. n = 263 metastatic kidney cancer (ccRCC). Different cohort, different preprocessing, different endpoint (PFS — progression-free survival). Same two-gene score. Same direction.

✗   Our own three-gene extension · same external survival gate · KILLED
p = 0.117 · pre-registered · commit 47a6bd5
The loop is not designed to confirm. It rejects — including itself.
IMmotion150 KM curve — TOP2A−EPAS1 median split, HR 1.36, p=0.0003
1.36
Cox HR per z-score
0.0003
Log-rank p
0.601
Harrell C-index
7.53 mo
Median PFS gap

Trajectory · DIPG active

250+
failed DIPG clinical trials

H3 K27M diffuse midline glioma — a pediatric brainstem cancer. Universally fatal — until August 2025. The same failure-first architecture that rejected the initial kidney-cancer layer is now pointed at brainstem.

15 candidates · pre-registered queue
Timeline
Decades
250+ failed clinical trials
no improvement over radiation alone
August 2025
Dordaviprone — first approval
itself a 10-year graveyard rescue · FDA accelerated
Now
15 candidates in queue
HERBY (n=47) · PNOC-022 (n=88) · pre-registered gate · same architecture

The published law is the tip.

The graveyard is the artifact.

Proven on known truth · trusted for unknown truth

For every scientist whose result said no.

github.com/jang1563/lacuna-falsification Explore the interactive story →
── extended evidence ──

IPF · Context Isolation

Separate context.
Separate truth.

$58.28 32 minutes SHA 88eaca3

The Skeptic ran in a separate context window — it never received the Advocate's reasoning tokens. Context isolation as live audit layer.

Caught: two prior-trial claims the Advocate stated as established fact — both empirically false per the original trial publications.
Two fabricated claims · Skeptic verdict
Advocate stated as fact (tralokinumab / RAINIER)
"No IPF IL-13 trial prespecified a Th2 stratifier — RAINIER enrolled unselected IPF, leaving the Th2-high endotype untested."
FABRICATED Skeptic verdict
RAINIER itself prespecified a periostin sub-group (canonical IL-13-induced Th2 marker). The "never prespecified" premise is empirically false per the original trial protocol.
Advocate stated as fact (simtuzumab / Raghu 2017)
"LOXL2-high IPF patients were never tested in the simtuzumab trial — stratified rescue remains viable."
FABRICATED Skeptic verdict
Raghu 2017 already prespecified LOXL2-stratified co-primary endpoints with the highest-tertile arm. The "never-tested" narrative is falsified by the originating trial itself.

Capability Overhang · E2 Ablation

Swap the model.
The loop breaks.

180 API calls. Three models, same 6 candidates, same gate metrics, same prompts/skeptic_review.md. One stance collapses completely — with extended thinking.

0
Sonnet (with thinking) · 0 / 60
0
Opus (no thinking) · 10 / 60 PASS

Opus ran without extended thinking · wins anyway · calibration, not compute

3 models × 6 candidates × 10 repeats · Skeptic verdict distribution
Model Verdict mix PASS / 60 Dissent on PASS
Opus 4.7
no thinking
10 66.7%
Haiku 4.5
with thinking
14 53.3%
Sonnet 4.6
with thinking
0 100%
Sonnet dissents on 100% of gate-PASS candidates · cannot distinguish pass from fail · thinking budget is irrelevant
PhI-1 · meta-calibration · Opus predicted 4/4 own rejections before testing
Skeleton #1: "VEGFA redundant with EPAS1 — delta_baseline will not clear 0.05"
FAIL: delta_baseline ✓
Skeleton #4: "CCNB1 alone insufficient; MKI67 already dominates proliferation hierarchy"
FAIL: delta_baseline ✓
Skeletons #2, #3: same pattern — kill test named the failure leg before the gate ran
4/4 ✓
The model's failure model is accurate. It knows when it's wrong before seeing the results.
Lacuna