Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

arXiv cs.AI / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study challenges the common single-cell foundation model practice of using only final-layer embeddings as optimal feature representations for downstream tasks.
  • It evaluates layer-wise representations from scFoundation (100M) and Tahoe-X1 (1.3B) for trajectory inference and perturbation response prediction, showing that optimal layers vary by task.
  • For trajectory inference, the best layer occurs around 60% depth (about 31% above the final-layer choice), indicating a nontrivial relationship between depth and biological signal quality.
  • For perturbation response prediction, optimal extraction layers shift widely (0–96%) depending on T cell activation context, highlighting strong context dependence.
  • The results also find that first-layer embeddings can outperform deeper layers in quiescent cells, suggesting that “hierarchical abstraction” assumptions may not universally hold.

Abstract

Current single-cell foundation model benchmarks universally extract final layer embeddings, assuming these represent optimal feature spaces. We systematically evaluate layer-wise representations from scFoundation (100M parameters) and Tahoe-X1 (1.3B parameters) across trajectory inference and perturbation response prediction. Our analysis reveals that optimal layers are task-dependent (trajectory peaks at 60% depth, 31% above final layers) and context-dependent (perturbation optima shift 0-96% across T cell activation states). Notably, first-layer embeddings outperform all deeper layers in quiescent cells, challenging assumptions about hierarchical feature abstraction. These findings demonstrate that "where" to extract features matters as much as "what" the model learns, necessitating systematic layer evaluation tailored to biological task and cellular context rather than defaulting to final-layer embeddings.