GeneMamba: An Efficient and Effective Foundation Model on Single Cell Data

arXiv cs.CL / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • GeneMamba is presented as a scalable foundation model for single-cell RNA sequencing that addresses scRNA-seq challenges like high dimensionality, sparsity, and batch effects.
  • The approach replaces transformer-style quadratic complexity with a state-space model (Bi-Mamba) to capture bidirectional gene context in linear time.
  • GeneMamba is pretrained on nearly 30 million cells and uses biologically informed training objectives, including pathway-aware contrastive loss and rank-based gene encoding.
  • Evaluations across multi-batch integration, cell type annotation, and gene-gene correlation show strong performance, along with interpretability and robustness compared with transformer baselines.
  • The authors position GeneMamba as a practical alternative to transformer-based methods for large-scale, biologically grounded single-cell analysis.

Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of cellular heterogeneity, but its complexity, which is marked by high dimensionality, sparsity, and batch effects, which poses major computational challenges. Transformer-based models have made significant advances in this domain but are often limited by their quadratic complexity and suboptimal handling of long-range dependencies. In this work, we introduce GeneMamba, a scalable and efficient foundation model for single-cell transcriptomics built on state space modeling. Leveraging the Bi-Mamba architecture, GeneMamba captures bidirectional gene context with linear-time complexity, offering substantial computational gains over transformer baselines. The model is pretrained on nearly 30 million cells and incorporates biologically informed objectives, including pathway-aware contrastive loss and rank-based gene encoding. We evaluate GeneMamba across diverse tasks, including multi-batch integration, cell type annotation, and gene-gene correlation, demonstrating strong performance, interpretability, and robustness. These results position GeneMamba as a practical and powerful alternative to transformer-based methods, advancing the development of biologically grounded, scalable tools for large-scale single-cell data analysis.