Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals
arXiv cs.LG / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper reports the discovery and extraction of a compact hematopoietic algorithm from the single-cell foundation model scGPT, achieved via mechanistic interpretability.
- It demonstrates that scGPT internally encodes a hematopoietic manifold with developmental branch structure, validated on a non-overlapping Tabula Sapiens panel and transferable to an independent multi-donor immune panel.
- The authors present a general three-stage extraction method—direct operator export from frozen attention weights, a lightweight adaptor, and a task-specific readout—that yields a standalone algorithm without retraining on the target dataset.
- In extensive benchmarks against scVI, Palantir, DPT, CellTypist, PCA, and baselines, the extracted head achieves superior pseudotime-depth ordering, top endpoints (CD4/CD8 AUROC 0.867, mono/macro AUROC 0.951), is 34.5x faster with ~1000x fewer trainable parameters, and can be compressed from three attention heads to a single head and then to a rank-64 surrogate while preserving performance.




