Intervention-Aware Multiscale Representation Learning from Imaging Phenomics and Perturbation Transcriptomics

arXiv cs.CV / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an intervention-aware multiscale representation learning method that uses perturbational transcriptomics to guide microscopy-based phenotypic profiling for drug discovery.
  • It introduces a transcriptome-conditioned teacher that conditions on gene expression plus intervention metadata, producing soft distributions over a chemistry-aware codebook to capture drug similarity.
  • A fine-tuned single-cell foundation model is used to encode cell-type context and disentangle dose effects, addressing weaknesses of prior multimodal approaches that rely on simple identity matching.
  • An image-only student model distills the teacher’s mechanistic knowledge so it can run independently at test time, improving generalization to unseen interventions.
  • Experiments on Cell Painting and RxRx with L1000 show significant gains in one-shot transfer to unseen interventions and drug-target gene discovery over self-supervised and alignment baselines, alongside theoretical risk-bound guarantees.

Abstract

Microscopy-based phenotypic profiling is scalable for drug discovery but lacks the mechanistic depth of transcriptomics, which remains costly and scarce. Existing multimodal approaches either use images to support other modalities or naively align representations by sample identity, ignoring cell-type and dose variations in weakly paired data-limiting generalization to unseen interventions. In this paper, we introduce an intervention-aware distillation framework that leverages perturbational transcriptomics to guide image representation learning. A transcriptome-conditioned teacher integrates gene expression and intervention metadata to produce soft distributions over a chemistry-aware codebook organized by drug similarity. The teacher employs a fine-tuned single-cell foundation model to encode cell-type context and disentangle dose effects. An image-only student learns to predict these distributions from microscopy alone, distilling mechanistic knowledge while operating independently at test time. This design emphasizes intervention semantics rather than identity alignment and explicitly handles dose and cell-type mismatches. We provide theoretical guarantees showing that transcriptomic guidance tightens the risk bound for image-based prediction. On Cell Painting and RxRx datasets paired with L1000, our method significantly improves one-shot transfer to unseen interventions and drug-target gene discovery compared to self-supervised and alignment baselines.