Closing the Domain Gap in Biomedical Imaging by In-Context Control Samples

arXiv cs.LG / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses batch effects in biomedical imaging that cause deep learning models to fail when evaluated on new experimental batches, harming reproducibility and real-world deployment.
  • It introduces CS-ARM-BN, a meta-learning adaptation method that uses negative control samples as stable in-context references to perform batch normalization–based risk minimization.
  • Experiments on Mechanism-of-Action (MoA) classification in the large-scale JUMP-CP dataset show that standard ResNets experience a large accuracy drop across new batches, while the proposed meta-learning method largely closes the domain gap.
  • The authors report that even foundation models with Typical Variation Normalization do not close the gap, whereas CS-ARM-BN achieves strong generalization (near training-domain performance) and remains robust when batches come from different labs.
  • The work suggests that biomedical imaging batch effects can be neutralized effectively and practically through principled in-context adaptation using control samples that are typically available in experiments.

Abstract

The central problem in biomedical imaging are batch effects: systematic technical variations unrelated to the biological signal of interest. These batch effects critically undermine experimental reproducibility and are the primary cause of failure of deep learning systems on new experimental batches, preventing their practical use in the real world. Despite years of research, no method has succeeded in closing this performance gap for deep learning models. We propose Control-Stabilized Adaptive Risk Minimization via Batch Normalization (CS-ARM-BN), a meta-learning adaptation method that exploits negative control samples. Such unperturbed reference images are present in every experimental batch by design and serve as stable context for adaptation. We validate our novel method on Mechanism-of-Action (MoA) classification, a crucial task for drug discovery, on the large-scale JUMP-CP dataset. The accuracy of standard ResNets drops from 0.939 \pm 0.005, on the training domain, to 0.862 \pm 0.060 on data from new experimental batches. Foundation models, even after Typical Variation Normalization, fail to close this gap. We are the first to show that meta-learning approaches close the domain gap by achieving 0.935 \pm 0.018. If the new experimental batches exhibit strong domain shifts, such as being generated in a different lab, meta-learning approaches can be stabilized with control samples, which are always available in biomedical experiments. Our work shows that batch effects in bioimaging data can be effectively neutralized through principled in-context adaptation, which also makes them practically usable and efficient.