SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention

arXiv cs.AI / 4/21/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • SAVE introduces a generalizable generative framework for multi-condition single-cell gene expression using conditional Transformers.
  • Instead of treating genes as independent tokens, SAVE groups semantically related genes into blocks and uses Gene Block Attention to capture higher-order dependencies among gene modules.
  • A Flow Matching mechanism and a condition-masking strategy improve flexible simulation and enable generalization to unseen combinations of biological and technical conditions.
  • Across multiple benchmarks—conditional generation, batch effect correction, and perturbation prediction—SAVE achieves better generation fidelity and extrapolative generalization than state-of-the-art methods, particularly in low-resource and combinatorially held-out scenarios.
  • The authors provide public code via GitHub, supporting reproducibility and broader adoption for virtual cell synthesis and biological interpretation.

Abstract

Modeling single-cell gene expression across diverse biological and technical conditions is crucial for characterizing cellular states and simulating unseen scenarios. Existing methods often treat genes as independent tokens, overlooking their high-level biological relationships and leading to poor performance. We introduce SAVE, a unified generative framework based on conditional Transformers for multi-condition single-cell modeling. SAVE leverages a coarse-grained representation by grouping semantically related genes into blocks, capturing higher-order dependencies among gene modules. A Flow Matching mechanism and condition-masking strategy further enhance flexible simulation and enable generalization to unseen condition combinations. We evaluate SAVE on a range of benchmarks, including conditional generation, batch effect correction, and perturbation prediction. SAVE consistently outperforms state-of-the-art methods in generation fidelity and extrapolative generalization, especially in low-resource or combinatorially held-out settings. Overall, SAVE offers a scalable and generalizable solution for modeling complex single-cell data, with broad utility in virtual cell synthesis and biological interpretation. Our code is publicly available at https://github.com/fdu-wangfeilab/sc-save