AI Navigate

Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift

arXiv cs.LG / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a staged transfer-learning framework that explicitly separates representation learning from task supervision to improve sample efficiency when adapting drug-response models to patient tumors under strong biological domain shift.
  • It first trains cellular and drug representations independently from large unlabeled pharmacogenomic data using autoencoder-based learning, then aligns these representations with drug-response labels on cell lines before adapting to patient tumors with few-shot supervision.
  • In systematic experiments across in-domain, cross-dataset, and patient-level settings, unsupervised pretraining provides limited benefit when source and target domains overlap but yields clear gains for adapting to patient tumors with very limited labeled data.
  • The framework achieves faster performance improvements during few-shot patient-level adaptation while maintaining accuracy comparable to single-phase baselines on standard cell-line benchmarks, illustrating data-efficient preclinical-to-clinical translation.
  • Overall, the work demonstrates that learning structured, transferable representations from unlabeled molecular profiles can substantially reduce clinical supervision needs for drug-response prediction.

Abstract

Predicting drug response in patients from preclinical data remains a major challenge in precision oncology due to the substantial biological gap between in vitro cell lines and patient tumors. Rather than aiming to improve absolute in vitro prediction accuracy, this work examines whether explicitly separating representation learning from task supervision enables more sample-efficient adaptation of drug-response models to patient data under strong biological domain shift. We propose a staged transfer-learning framework in which cellular and drug representations are first learned independently from large collections of unlabeled pharmacogenomic data using autoencoder-based representation learning. These representations are then aligned with drug-response labels on cell-line data and subsequently adapted to patient tumors using few-shot supervision. Through a systematic evaluation spanning in-domain, cross-dataset, and patient-level settings, we show that unsupervised pretraining provides limited benefit when source and target domains overlap substantially, but yields clear gains when adapting to patient tumors with very limited labeled data. In particular, the proposed framework achieves faster performance improvements during few-shot patient-level adaptation while maintaining comparable accuracy to single-phase baselines on standard cell-line benchmarks. Overall, these results demonstrate that learning structured and transferable representations from unlabeled molecular profiles can substantially reduce the amount of clinical supervision required for effective drug-response prediction, offering a practical pathway toward data-efficient preclinical-to-clinical translation.