AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

arXiv cs.AI / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • AblateCell is a new “reproduce-then-ablate” agent designed to systematically attribute performance gains in AI Virtual Cells by addressing the lack of rigorous ablation and verification on under-standardized biological repositories.
  • It first reproduces published baselines end-to-end by automatically configuring environments, fixing dependency/data issues, and rerunning official evaluations while producing verifiable artifacts.
  • It then performs closed-loop ablation by creating a graph of isolated repository mutations and adaptively selecting experiments that balance performance impact against execution cost.
  • Experiments on three single-cell perturbation prediction repositories (CPA, GEARS, BioLORD) show high workflow success and improved accuracy in recovering ground-truth critical components, enabling scalable verification directly on biological codebases.

Abstract

Systematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter. We introduce AblateCell, a reproduce-then-ablate agent for virtual cell repositories that closes this verification gap. AblateCell first reproduces reported baselines end-to-end by auto-configuring environments, resolving dependency and data issues, and rerunning official evaluations while emitting verifiable artifacts. It then conducts closed-loop ablation by generating a graph of isolated repository mutations and adaptively selecting experiments under a reward that trades off performance impact and execution cost. Evaluated on three single-cell perturbation prediction repositories (CPA, GEARS, BioLORD), AblateCell achieves 88.9% (+29.9% to human expert) end-to-end workflow success and 93.3% (+53.3% to heuristic) accuracy in recovering ground-truth critical components. These results enable scalable, repository-grounded verification and attribution directly on biological codebases.