BioBO: Biology-informed Bayesian Optimization for Perturbation Design

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces BioBO, a biology-informed Bayesian optimization framework for designing genomic perturbation experiments that addresses the intractable size of the search space.
  • BioBO improves Bayesian surrogate modeling and acquisition by integrating multimodal gene embeddings with biological prior knowledge and enrichment analysis (a common gene prioritization approach).
  • Experiments on public benchmarks show BioBO increases labeling efficiency by 25–40% and outperforms conventional Bayesian optimization at finding top-performing perturbations.
  • By leveraging enrichment analysis, BioBO provides pathway-level explanations for selected perturbations, improving mechanistic interpretability and linking designs to coherent regulatory circuits.

Abstract

Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.