Bi-Level Optimization for Single Domain Generalization

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles Single Domain Generalization (SDG), aiming to generalize from one labeled source domain to unseen target domains without using any target data during training.
  • It introduces BiSDG, a bi-level optimization framework that decouples task learning from domain modeling using a domain prompt encoder to generate feature modulation signals.
  • BiSDG simulates distribution shifts by creating surrogate domains through label-preserving transformations of the source data, enabling training pressure toward invariance.
  • The method formulates learning as a bi-level problem where an inner loop optimizes task performance under fixed prompts and an outer loop updates the domain prompt encoder to improve generalization.
  • Experiments on multiple SGD benchmarks report consistent improvements over prior approaches and claim new state-of-the-art results in the SDG setting.

Abstract

Generalizing from a single labeled source domain to unseen target domains, without access to any target data during training, remains a fundamental challenge in robust machine learning. We address this underexplored setting, known as Single Domain Generalization (SDG), by proposing BiSDG, a bi-level optimization framework that explicitly decouples task learning from domain modeling. BiSDG simulates distribution shifts through surrogate domains constructed via label-preserving transformations of the source data. To capture domain-specific context, we propose a domain prompt encoder that generates lightweight modulation signals to produce augmenting features via feature-wise linear modulation. The learning process is formulated as a bi-level optimization problem: the inner objective optimizes task performance under fixed prompts, while the outer objective maximizes generalization across the surrogate domains by updating the domain prompt encoder. We further develop a practical gradient approximation scheme that enables efficient bi-level training without second-order derivatives. Extensive experiments on various SGD benchmarks demonstrate that BiSDG consistently outperforms prior methods, setting new state-of-the-art performance in the SDG setting.