A multi-omics AI benchmark for spaceflight biomedical data — 21 ML tasks across 9 modalities + 100-question LLM evaluation from the Inspiration4, NASA Twins, and JAXA CFE missions.
21
ML Tasks
9
Modalities
100
LLM Questions
7
Baselines
152K+
Total Samples
Baseline Results
Performance across all tasks. Best score per row highlighted. E2/E3 are supplementary (extreme imbalance).
Task
Name
Category
Tier
Metric
Random
Majority
LogReg
RF
MLP
XGB
LGBM
Performance Analysis
Normalized composite scores, category radar, and difficulty distribution.
Insight: Removing effect-size features (fold-changes) preserves RF/MLP performance (0.86 AUPRC),
confirming the task tests genuine distributional signal rather than simple effect-size thresholding.