BioTeam-AI — Personal AI Science Team

ARCHITECTURE

3-Tier Agent Architecture

From strategic decision-making to domain expert analysis and independent quality assurance — a hierarchical agent structure reflecting the complexity of biology research

Tier 1Strategic

Research Director OpusSonnet

Knowledge Manager Sonnet

Project Manager Haiku

Tier 2Domain Experts

Genomics & Epigenomics Sonnet

Transcriptomics & Single-Cell Sonnet

Proteomics & Metabolomics Sonnet

Biostatistics Sonnet

Machine Learning & DL Sonnet

Systems Biology Sonnet

Structural Biology Sonnet

Experimental Designer Sonnet

Integrative Biologist Sonnet

Scientific Communication Sonnet

Grant Writing & Funding Opus

Data Engineering Haiku

Tier 3QA Layer

Statistical Rigor Sonnet

Biological Plausibility Sonnet

Reproducibility & Standards Haiku

EnginesHybrid

Ambiguity Resolution Engine Code+LLM

Negative Results Module Code+LLM

Next.js 15
Dashboard

→

FastAPI
REST + SSE

→

Anthropic SDK
+ Instructor

→

Claude
Opus · Sonnet · Haiku

USE CASES

Use Cases

Your AI team is with you at every stage of research

📚

W1: Literature Review Workflow

Automated search across PubMed, Semantic Scholar, and bioRxiv → systematic synthesis → evidence-level scoring in one go

1

SCOPE & DECOMPOSE — Research Director decomposes research questions into sub-queries

2

SEARCH — Knowledge Manager searches PubMed + Semantic Scholar + bioRxiv simultaneously

3

SCREEN & EXTRACT — PRISMA-compliant screening, verbatim quote extraction

4

SYNTHESIZE — Opus model integrates key findings, Citation Validator verifies references

5

REPORT — RCMXT scores + PRISMA Flow + SessionManifest + BibTeX export

⚡

Direct Query Mode

Simple questions answered in under 30 seconds. Complex queries automatically routed to the appropriate workflow

1

CLASSIFY — Research Director (Sonnet) classifies the query type

2

simple_query — Knowledge Manager lookup, then 1 specialist answers immediately

3

needs_workflow — Auto-routed to the appropriate W1-W6 pipeline

4

SSE real-time streaming response, cost under $0.50

🧪

Lab Knowledge Base

Systematically record lab negative results and protocols, with automatic cross-referencing during literature reviews

1

Record — Store failed experiments, conditions, and outcomes in structured format

2

Classify — Auto-classify failure_category (technical, biological, design, etc.)

3

Cross-reference — Lab KB auto-searched during W1 NEGATIVE CHECK step

4

Verification tracking — Manage reliability via verified_by, verification_status

🎛️

Mission Control Dashboard

Monitor all agent states in real-time and directly intervene in workflows through a unified control panel

1

Agent Grid — Real-time display of 18 agents' idle/busy/unavailable status

2

Workflow Cards — Step-by-step progress and budget consumption for active workflows

3

Activity Feed — SSE-based real-time event stream

4

Intervention — Direct commands: PAUSE, ADD_PAPER, EXCLUDE_PAPER, MODIFY_QUERY

EVIDENCE SCORING

RCMXT Evidence Confidence Vector

In biology, truth is not binary. Every claim receives a 5-axis confidence vector.

R

Reproducibility

Reproducibility from independent sources

C

Context

Organism & condition specificity

M

Methodology

Methodological robustness & sample size

X

Cross-validation

Multi-omics cross-validation

T

Temporal

Temporal stability & recency

"TP53 mutation → apoptosis resistance in NSCLC"

R: 0.85 — Reproduced in 4 independent studies
C: 0.70 — NSCLC-specific, differs in other cancer types
M: 0.92 — n=500+ cohort, WGS + functional assay
X: 0.60 — Confirmed by Genomics + Proteomics, Metabolomics pending
T: 0.78 — Consistent reports 2019-2024, includes 2024 meta-analysis

Composite: 0.77 · Citation verified: 100% (12/12)

WORKFLOWS

6 Research Workflows

From Direct Query to Ambiguity Resolution — pipelines covering the entire research process

W1Literature Review

Systematic literature review — auto-generated PRISMA flow, evidence synthesis, RCMXT scoring. Includes human checkpoint.

SCOPESEARCHSCREENEXTRACTNEG CHECKSYNTHESIZEVALIDATEREPORT

Est. cost: $2–5 · Duration: 3–8 min

W2Hypothesis Generation

7 expert agents generate hypotheses in parallel → QA debate → RCMXT profiling. Up to 3 iterations.

CONTEXTGENERATE ×7NEG FILTERDEBATERANKEVOLVE

Est. cost: $3–8 · Duration: 5–15 min

W3Data Analysis

Data QC → analysis plan → execution → statistical validation → biological plausibility. Includes code generation.

INGESTQCPLANEXECUTEVALIDATEINTERPRET

Est. cost: $2–6 · Duration: 5–20 min

W4Manuscript Writing

Manuscript drafting — structure design, section writing, figure planning, stat review, up to 5 revision cycles.

OUTLINEDRAFTFIGURESSTAT REVIEWREVISION ×5

Est. cost: $5–15 · Duration: 10–30 min

W5Grant Proposal

NIH/NSF grant proposal — Specific Aims, Research Strategy, preliminary data, budget, Mock Review.

OPPORTUNITYAIMSSTRATEGYPRELIM DATABUDGETMOCK REVIEW

Est. cost: $8–20 · Uses Opus model

W6Ambiguity Resolution

Resolving conflicting evidence — contradiction classification, negative result mining, resolution hypotheses, discriminating experiment design.

IDENTIFYLANDSCAPECLASSIFYMINE NEGRESOLVEEXPERIMENT

Est. cost: $3–10 · Uses Ambiguity Engine

CORE FEATURES

Core Features

Epistemology-driven features built for biology research

🔬

Citation Validation

Deterministically cross-verify all citations in synthesis against search results. Extracts DOI/PMID patterns and computes verification rate.

DETERMINISTIC

📋

Session Manifest

Auto-generated reproducibility metadata per workflow run. Records model versions, token counts, costs, search queries, and temperature settings.

REPRODUCIBILITY

📊

PRISMA Flow

Auto-generates PRISMA flow diagram data during W1 literature review. Tracks the full pipeline: identified → screened → assessed → included.

SYSTEMATIC REVIEW

❌

Negative Results Module

Integrates negative results to correct publication bias. Auto cross-references Lab KB failure data during literature reviews.

BIAS CORRECTION

🌊

SSE Real-time Feed

Real-time event stream powered by sse-starlette. Instantly reflects workflow progress and agent state changes on the dashboard.

REAL-TIME

💰

Cost Tracking

Per-agent, per-workflow cost tracking. 85% cost reduction via prompt caching. Auto-pauses on budget overrun.

COST-AWARE

🔄

Human-in-the-Loop

Pauses at Human Checkpoints within workflows. Structured intervention: ADD_PAPER, EXCLUDE_PAPER, MODIFY_QUERY, and more.

INTERVENTION

🛡️

Provenance Tagging

Provenance tags on all data. Prevents circular reasoning amplification. Distinguishes source_type: primary_literature, preprint, internal_synthesis.

DATA INTEGRITY

🔐

Security Architecture

Bearer token + SSE query param auth, per-IP rate limiting (60rpm), Circuit Breaker pattern, CORS controls.

HARDENED

PROGRESS

Development Progress

Phase 1 complete, Phase 2 in progress

Phase 1 · Week 1-2

Foundation & Core Agents

Project structure, LLMLayer, MockLayer, 3 core agents (Research Director, Knowledge Manager, Project Manager), Workflow Engine, SSE Hub

FastAPISQLite WALChromaDBPydantic Models

Phase 1 · Week 3-4

Integrations & Domain Teams

PubMed (Biopython) + Semantic Scholar integration, CostTracker, T02 Transcriptomics, T10 Data Engineering, Lab KB, W1 pipeline

PubMed APISemantic ScholarT02 TeamT10 TeamW1 Pipeline

Phase 1 · Week 5

Dashboard & Security

Next.js 15 dashboard MVP (Mission Control, Lab KB, Settings, Query), Bearer auth, Rate Limiting, 106 fuzzing tests, a11y audit

Next.js 15shadcn/uiZustand106 Fuzzing Tests334 Unit Tests

Phase 1b · Tier 1 Reproducibility

RCMXT & Citation Validation

RCMXT scoring engine (deterministic heuristics), Citation Validator, SessionManifest auto-generation, W1 pipeline execution, Direct Query UI

RCMXT ScorerCitation ValidatorSessionManifestPRISMAFlow469 Tests Total

Phase 2 · Week 6-11

Full Workflows & Scale

Full W2-W6 implementation, remaining 12 agent teams, Redis + Celery async task queue, React Flow workflow visualization

Redis + CeleryReact FlowW2-W6All 18 Agents

Phase 3 · Week 12-15

Advanced Engines & Sandbox

Full Ambiguity Resolution Engine, Negative Results Module expansion, Docker code sandbox, HPC runner

Ambiguity EngineNR Module FullCode SandboxHPC

Phase 4 · Week 16-18

Production Deployment

Vercel deployment, NextAuth.js authentication, PostgreSQL migration, E2E tests, Langfuse monitoring

VercelPostgreSQLNextAuthE2E Tests

TECH STACK

Tech Stack

An optimally curated combination of proven tools

⚛️

Next.js 15

Frontend Framework

🎨

Tailwind CSS 4

Styling

🧩

shadcn/ui

Component Library

🐻

Zustand

State Management

⚡

FastAPI

Backend API

🤖

Anthropic SDK

LLM Client

📐

Instructor

Structured Output

🧬

Biopython

PubMed Access

🔮

ChromaDB

Vector Database

🗃️

SQLite WAL

State Database

📡

sse-starlette

Real-time Events

🐳

Docker Compose

Deployment

Your OwnAI Science Team