PHASE 14 COMPLETE — 2017 TESTS PASSING

Your Own
AI Science Team

23 specialized LLM agents and 5 hybrid engines on a 3-tier architecture supporting the entire biology research lifecycle. From literature review to manuscript writing and grant proposals — all in one dashboard.

23 LLM Agents
11 Workflows
2017 Tests
3 Tier Architecture
Weill Cornell Medicine
JangKeun Kim
v1.0 — 2026
SCROLL

3-Tier Agent Architecture

From strategic decision-making to domain expert analysis and independent quality assurance — a hierarchical agent structure reflecting the complexity of biology research

Tier 1Strategic
Research Director OpusSonnet
Knowledge Manager Sonnet
Project Manager Haiku
Tier 2Domain Experts
Genomics & Epigenomics Sonnet
Transcriptomics & Single-Cell Sonnet
Proteomics & Metabolomics Sonnet
Biostatistics Sonnet
Machine Learning & DL Sonnet
Systems Biology Sonnet
Structural Biology Sonnet
Experimental Designer Sonnet
Integrative Biologist Sonnet
Scientific Communication Sonnet
Grant Writing & Funding Opus
Data Engineering Haiku
Tier 3QA Layer
Statistical Rigor Sonnet
Biological Plausibility Sonnet
Reproducibility & Standards Haiku
EnginesHybrid
Ambiguity Resolution Engine Code+LLM
Negative Results Module Code+LLM
Next.js 15
Dashboard
FastAPI
REST + SSE
Anthropic SDK
+ Instructor
Claude
Opus · Sonnet · Haiku

Use Cases

Your AI team is with you at every stage of research

📚

W1: Literature Review Workflow

Automated search across PubMed, Semantic Scholar, and bioRxiv → systematic synthesis → evidence-level scoring in one go

1
SCOPE & DECOMPOSE — Research Director decomposes research questions into sub-queries
2
SEARCH — Knowledge Manager searches PubMed + Semantic Scholar + bioRxiv simultaneously
3
SCREEN & EXTRACT — PRISMA-compliant screening, verbatim quote extraction
4
SYNTHESIZE — Opus model integrates key findings, Citation Validator verifies references
5
REPORT — RCMXT scores + PRISMA Flow + SessionManifest + BibTeX export

Direct Query Mode

Simple questions answered in under 30 seconds. Complex queries automatically routed to the appropriate workflow

1
CLASSIFY — Research Director (Sonnet) classifies the query type
2
simple_query — Knowledge Manager lookup, then 1 specialist answers immediately
3
needs_workflow — Auto-routed to the appropriate W1-W6 pipeline
4
SSE real-time streaming response, cost under $0.50
🧪

Lab Knowledge Base

Systematically record lab negative results and protocols, with automatic cross-referencing during literature reviews

1
Record — Store failed experiments, conditions, and outcomes in structured format
2
Classify — Auto-classify failure_category (technical, biological, design, etc.)
3
Cross-reference — Lab KB auto-searched during W1 NEGATIVE CHECK step
4
Verification tracking — Manage reliability via verified_by, verification_status
🎛️

Mission Control Dashboard

Monitor all agent states in real-time and directly intervene in workflows through a unified control panel

1
Agent Grid — Real-time display of 18 agents' idle/busy/unavailable status
2
Workflow Cards — Step-by-step progress and budget consumption for active workflows
3
Activity Feed — SSE-based real-time event stream
4
Intervention — Direct commands: PAUSE, ADD_PAPER, EXCLUDE_PAPER, MODIFY_QUERY

RCMXT Evidence Confidence Vector

In biology, truth is not binary. Every claim receives a 5-axis confidence vector.

R
Reproducibility
Reproducibility from independent sources
C
Context
Organism & condition specificity
M
Methodology
Methodological robustness & sample size
X
Cross-validation
Multi-omics cross-validation
T
Temporal
Temporal stability & recency
"TP53 mutation → apoptosis resistance in NSCLC"

R: 0.85 — Reproduced in 4 independent studies
C: 0.70 — NSCLC-specific, differs in other cancer types
M: 0.92 — n=500+ cohort, WGS + functional assay
X: 0.60 — Confirmed by Genomics + Proteomics, Metabolomics pending
T: 0.78 — Consistent reports 2019-2024, includes 2024 meta-analysis

Composite: 0.77 · Citation verified: 100% (12/12)

6 Research Workflows

From Direct Query to Ambiguity Resolution — pipelines covering the entire research process

W1Literature Review
Systematic literature review — auto-generated PRISMA flow, evidence synthesis, RCMXT scoring. Includes human checkpoint.
SCOPESEARCHSCREENEXTRACTNEG CHECKSYNTHESIZEVALIDATEREPORT
Est. cost: $2–5 · Duration: 3–8 min
W2Hypothesis Generation
7 expert agents generate hypotheses in parallel → QA debate → RCMXT profiling. Up to 3 iterations.
CONTEXTGENERATE ×7NEG FILTERDEBATERANKEVOLVE
Est. cost: $3–8 · Duration: 5–15 min
W3Data Analysis
Data QC → analysis plan → execution → statistical validation → biological plausibility. Includes code generation.
INGESTQCPLANEXECUTEVALIDATEINTERPRET
Est. cost: $2–6 · Duration: 5–20 min
W4Manuscript Writing
Manuscript drafting — structure design, section writing, figure planning, stat review, up to 5 revision cycles.
OUTLINEDRAFTFIGURESSTAT REVIEWREVISION ×5
Est. cost: $5–15 · Duration: 10–30 min
W5Grant Proposal
NIH/NSF grant proposal — Specific Aims, Research Strategy, preliminary data, budget, Mock Review.
OPPORTUNITYAIMSSTRATEGYPRELIM DATABUDGETMOCK REVIEW
Est. cost: $8–20 · Uses Opus model
W6Ambiguity Resolution
Resolving conflicting evidence — contradiction classification, negative result mining, resolution hypotheses, discriminating experiment design.
IDENTIFYLANDSCAPECLASSIFYMINE NEGRESOLVEEXPERIMENT
Est. cost: $3–10 · Uses Ambiguity Engine

Core Features

Epistemology-driven features built for biology research

🔬

Citation Validation

Deterministically cross-verify all citations in synthesis against search results. Extracts DOI/PMID patterns and computes verification rate.

DETERMINISTIC
📋

Session Manifest

Auto-generated reproducibility metadata per workflow run. Records model versions, token counts, costs, search queries, and temperature settings.

REPRODUCIBILITY
📊

PRISMA Flow

Auto-generates PRISMA flow diagram data during W1 literature review. Tracks the full pipeline: identified → screened → assessed → included.

SYSTEMATIC REVIEW

Negative Results Module

Integrates negative results to correct publication bias. Auto cross-references Lab KB failure data during literature reviews.

BIAS CORRECTION
🌊

SSE Real-time Feed

Real-time event stream powered by sse-starlette. Instantly reflects workflow progress and agent state changes on the dashboard.

REAL-TIME
💰

Cost Tracking

Per-agent, per-workflow cost tracking. 85% cost reduction via prompt caching. Auto-pauses on budget overrun.

COST-AWARE
🔄

Human-in-the-Loop

Pauses at Human Checkpoints within workflows. Structured intervention: ADD_PAPER, EXCLUDE_PAPER, MODIFY_QUERY, and more.

INTERVENTION
🛡️

Provenance Tagging

Provenance tags on all data. Prevents circular reasoning amplification. Distinguishes source_type: primary_literature, preprint, internal_synthesis.

DATA INTEGRITY
🔐

Security Architecture

Bearer token + SSE query param auth, per-IP rate limiting (60rpm), Circuit Breaker pattern, CORS controls.

HARDENED

Development Progress

Phase 1 complete, Phase 2 in progress

Phase 1 · Week 1-2
Foundation & Core Agents
Project structure, LLMLayer, MockLayer, 3 core agents (Research Director, Knowledge Manager, Project Manager), Workflow Engine, SSE Hub
FastAPISQLite WALChromaDBPydantic Models
Phase 1 · Week 3-4
Integrations & Domain Teams
PubMed (Biopython) + Semantic Scholar integration, CostTracker, T02 Transcriptomics, T10 Data Engineering, Lab KB, W1 pipeline
PubMed APISemantic ScholarT02 TeamT10 TeamW1 Pipeline
Phase 1 · Week 5
Dashboard & Security
Next.js 15 dashboard MVP (Mission Control, Lab KB, Settings, Query), Bearer auth, Rate Limiting, 106 fuzzing tests, a11y audit
Next.js 15shadcn/uiZustand106 Fuzzing Tests334 Unit Tests
Phase 1b · Tier 1 Reproducibility
RCMXT & Citation Validation
RCMXT scoring engine (deterministic heuristics), Citation Validator, SessionManifest auto-generation, W1 pipeline execution, Direct Query UI
RCMXT ScorerCitation ValidatorSessionManifestPRISMAFlow469 Tests Total
Phase 2 · Week 6-11
Full Workflows & Scale
Full W2-W6 implementation, remaining 12 agent teams, Redis + Celery async task queue, React Flow workflow visualization
Redis + CeleryReact FlowW2-W6All 18 Agents
Phase 3 · Week 12-15
Advanced Engines & Sandbox
Full Ambiguity Resolution Engine, Negative Results Module expansion, Docker code sandbox, HPC runner
Ambiguity EngineNR Module FullCode SandboxHPC
Phase 4 · Week 16-18
Production Deployment
Vercel deployment, NextAuth.js authentication, PostgreSQL migration, E2E tests, Langfuse monitoring
VercelPostgreSQLNextAuthE2E Tests

Tech Stack

An optimally curated combination of proven tools

⚛️
Next.js 15
Frontend Framework
🎨
Tailwind CSS 4
Styling
🧩
shadcn/ui
Component Library
🐻
Zustand
State Management
FastAPI
Backend API
🤖
Anthropic SDK
LLM Client
📐
Instructor
Structured Output
🧬
Biopython
PubMed Access
🔮
ChromaDB
Vector Database
🗃️
SQLite WAL
State Database
📡
sse-starlette
Real-time Events
🐳
Docker Compose
Deployment