Multi-Domain Learning with Global Expert Mapping

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a common limitation of vision models: they often do not generalize well to new domains beyond their training distributions, motivating multi-dataset learning for robustness to domain shift.
  • It argues that existing Mixture-of-Experts (MoE) approaches can under-specialize because load-balancing encourages uniform routing, which conflicts with domain-aware specialization and hurts performance on rare or out-of-distribution domains.
  • The authors propose GEM (Global Expert Mapping), a planner–compiler framework that replaces the learned router with a global scheduler that computes dataset-to-expert assignments via linear programming relaxation.
  • A hierarchical rounding compiler then converts the fractional plan into a deterministic, capacity-aware routing map, aiming to avoid balancing loss and produce more interpretable routing behavior.
  • Experiments report that GEM-DINO reaches state-of-the-art results on the UODB benchmark, including gains on underrepresented datasets and improved behavior in few-shot adaptation by reducing task interference.

Abstract

Human perception generalizes well across different domains, but most vision models struggle beyond their training data. This gap motivates multi-dataset learning, where a single model is trained on diverse datasets to improve robustness under domain shifts. However, unified training remains challenging due to inconsistencies in data distributions and label semantics. Mixture-of-Experts (MoE) models provide a scalable solution by routing inputs to specialized subnetworks (experts). Yet, existing MoEs often fail to specialize effectively, as their load-balancing mechanisms enforce uniform input distribution across experts. This fairness conflicts with domain-aware routing, causing experts to learn redundant representations, and reducing performance especially on rare or out-of-distribution domains. We propose GEM (Global Expert Mapping), a planner-compiler framework that replaces the learned router with a global scheduler. Our planner, based on linear programming relaxation, computes a fractional assignment of datasets to experts, while the compiler applies hierarchical rounding to convert this soft plan into a deterministic, capacity-aware mapping. Unlike prior MoEs, GEM avoids balancing loss, resolves the conflict between fairness and specialization, and produces interpretable routing. Experiments show that GEM-DINO achieves state-of-the-art performance on the UODB benchmark, with notable gains on underrepresented datasets and solves task interference in few-shot adaptation scenarios.