BioTeam-AI v0.6

Research Digest
Multi-Source Literature Monitoring

Automatically collect, deduplicate, score relevance, and AI-summarize the latest papers and code from 6 sources — in a single pipeline

TopicProfile
PubMed bioRxiv arXiv GitHub HuggingFace Semantic Scholar
Dedup
Score
Summarize
DigestReport

Topic Configuration

Define the research topics you want to monitor. Freely configure search keywords, sources, schedules, and category filters.

🧬 AI in Biology Research
⏱ Daily
machine learning biology foundation models genomics AI drug discovery protein language models
PubMed bioRxiv arXiv GitHub HuggingFace Semantic Scholar
arXiv
cs.AI cs.LG q-bio
bioRxiv
bioinformatics genomics

Collected Papers & Code

Results collected from 6 sources by the pipeline, deduplicated, and scored for relevance.

arXiv

Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

Presents an approach of using AI to model and simulate biology and life across multiple scales. Proposes multiscale foundation models spanning molecular, cellular, and organism levels for biology research.
PubMed

Structure-Based Approaches for Protein-Protein Interaction Prediction Using Machine Learning and Deep Learning

PPI prediction plays a pivotal role in understanding cellular processes. Structure-based prediction has emerged as a robust alternative to sequence-based methods, offering greater biological accuracy by integrating 3D spatial and biochemical features.
HF

ESM Cambrian 600M — Protein Language Model by EvolutionaryScale

ESM Cambrian focuses on creating representations of protein biology, scaling up data and training compute to deliver significant performance improvements over ESM2. Part of a family scaling up to 6B parameters.
bioRxiv

CellFM: A Large-scale Foundation Model Pre-trained on Transcriptomics of 100 Million Human Cells

800M parameter foundation model pre-trained on 100M single-cell transcriptomes. Addresses challenges of noise, batch effects, and sparsity to learn unified cellular state representations. Published in Nature Communications.
GitHub

google-deepmind/alphafold3

AlphaFold 3 inference pipeline for predicting accurate structures of biomolecular interactions including proteins, ligands, and nucleic acids. Source code released under CC-BY-NC-SA 4.0 license.
S2

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

Comprehensive overview of LLM and MLLM development, principles, and applications in medicine. Covers diagnostic assistance, clinical note generation, drug interaction prediction, and medical education across 12+ domains.
PubMed

Versatile Framework for Drug-Target Interaction Prediction by Considering Domain-Specific Features

Predicting drug-target interactions is crucial in drug discovery, but traditional experiments are costly and time-consuming. Deep learning-based framework with domain-specific molecular and protein features for accelerating DTI prediction.
arXiv

Technical Report of HelixFold3 for Biomolecular Structure Prediction

Open-source implementation achieving AlphaFold3-comparable accuracy for biomolecular structure prediction. Covers protein complexes, protein-ligand, protein-nucleic acid interactions with full reproducibility.
bioRxiv

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

Combines LLM reasoning with domain expertise for automated CRISPR experiment design. Performs sgRNA design, off-target analysis, and protocol generation via natural language interaction with biology researchers.
GitHub

bytedance/Protenix

Open-source biomolecular structure prediction toolkit. First fully open-source model (Apache 2.0) that outperforms AlphaFold3 across diverse benchmark sets with 368M parameters.
HF

Geneformer — Foundation Transformer Model for Single-Cell Biology

Foundational transformer model pretrained on ~104M single-cell transcriptomes for context-aware predictions in network biology. Supports zero-shot learning and fine-tuning for gene/cell state classification. Published in Nature.
S2

Scientific Large Language Models: A Survey on Biological & Chemical Domains

Survey of LLM applications in specialized scientific domains, particularly biology and chemistry. Analyzes models for protein, DNA, RNA, and molecular understanding. Published in ACM Computing Surveys.

Digest Report

A weekly research trends report generated by DigestAgent (Claude Haiku) analyzing the collected papers.

📋 AI in Biology Research — Weekly Digest
2024.01.26 — 2025.01.17 · 12 entries from 6 sources
LLM Cost: $0.008
Notable advances in AI+Biology across multiple fronts. In protein structure prediction, AlphaFold3 went open-source on GitHub, and ByteDance's Protenix became the first fully open-source model to outperform it. Single-cell foundation models are scaling rapidly — CellFM (100M cells) and Geneformer (104M transcriptomes) demonstrate the power of large-scale pretraining. CRISPR-GPT showcases LLM agents for automated experiment design, while ESM Cambrian advances protein representation learning.
  • 1
    AlphaFold3 Open-Source Release
    GitHub Google DeepMind released AlphaFold3 inference pipeline on GitHub (7.6k stars). Predicts protein, ligand, and nucleic acid complex structures under CC-BY-NC-SA 4.0 license.
  • 2
    CellFM: 100M-Cell Foundation Model
    bioRxiv 800M parameter model pre-trained on 100M single-cell transcriptomes. Learns unified cellular state representations, now published in Nature Communications.
  • 3
    Geneformer V2: 104M Transcriptomes for Network Biology
    HuggingFace Foundational transformer pretrained on ~104M single-cell transcriptomes. Enables zero-shot learning and fine-tuning for gene/cell state classification. Published in Nature.
  • 4
    AI-Driven Digital Organism: Multiscale Foundation Models
    arXiv Proposes multiscale foundation models for predicting, simulating and programming biology at molecular, cellular, and organism levels.
  • 5
    CRISPR-GPT: LLM Agent for Gene-Editing Experiment Design
    bioRxiv Combines LLM reasoning with CRISPR domain expertise for automated sgRNA design, off-target analysis, and protocol generation via natural language.
2
2
2
2
2
2
PubMed (2)
bioRxiv (2)
arXiv (2)
GitHub (2)
HuggingFace (2)
Semantic Scholar (2)

Pipeline Statistics

Cumulative operational statistics. All data is persisted in SQLite and DigestScheduler automatically manages collection cycles.

0
Collected Papers/Code
0
Reports Generated
0
Monitored Topics
0
Avg Fetch Time (sec)
PubMed 54 (22%)
bioRxiv 49 (20%)
arXiv 45 (18%)
GitHub 37 (15%)
HuggingFace 32 (13%)
S2 30 (12%)

API Usage Examples

Programmatically manage topics, trigger collection, and retrieve reports via the REST API.

POST /api/v1/digest/topics
Create topic profile
curl -X POST http://localhost:8000/api/v1/digest/topics \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $BIOTEAM_API_KEY" \
  -d '{
    "name": "AI in Biology Research",
    "queries": ["machine learning biology", "protein language models"],
    "sources": ["pubmed", "biorxiv", "arxiv", "github", "huggingface"],
    "schedule": "daily"
  }'

# Response
{
  "id": "a1b2c3d4-...",
  "name": "AI in Biology Research",
  "schedule": "daily",
  "is_active": true,
  "created_at": "2026-02-24T09:00:00Z"
}
POST /api/v1/digest/topics/{id}/run
Trigger immediate fetch
curl -X POST http://localhost:8000/api/v1/digest/topics/a1b2c3d4-.../run \
  -H "Authorization: Bearer $BIOTEAM_API_KEY"

# Response — returns immediately, pipeline runs in background
{
  "topic_id": "a1b2c3d4-...",
  "status": "triggered",
  "message": "Digest pipeline started in background"
}
GET /api/v1/digest/reports?topic_id={id}
Retrieve digest reports
curl http://localhost:8000/api/v1/digest/reports?topic_id=a1b2c3d4-... \
  -H "Authorization: Bearer $BIOTEAM_API_KEY"

# Response
[{
  "id": "r5e6f7g8-...",
  "topic_id": "a1b2c3d4-...",
  "period_start": "2026-02-17T00:00:00Z",
  "period_end": "2026-02-24T00:00:00Z",
  "entry_count": 12,
  "summary": "Notable advances this week in AI+Biology...",
  "highlights": [{
    "title": "AlphaFold3 Open-Source Release",
    "source": "github",
    "one_liner": "Expected to greatly improve accessibility of structural biology research"
  }],
  "source_breakdown": {"pubmed": 2, "biorxiv": 2, ...},
  "cost": 0.008
}]
GET /api/v1/digest/stats
Get statistics
curl http://localhost:8000/api/v1/digest/stats \
  -H "Authorization: Bearer $BIOTEAM_API_KEY"

# Response
{
  "total_topics": 3,
  "total_entries": 247,
  "total_reports": 12,
  "entries_by_source": {
    "pubmed": 54, "biorxiv": 49,
    "arxiv": 45, "github": 37,
    "huggingface": 32, "semantic_scholar": 30
  }
}