Efficient Universal Perception Encoder

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper proposes an Efficient Universal Perception Encoder (EUPE) designed to enable running versatile AI vision models on resource-constrained edge devices while maintaining strong representations across many downstream tasks.
  • EUPE is trained via distillation from multiple domain-expert foundation vision encoders, aiming to produce a single small encoder with both inference efficiency and broadly useful perceptual features.
  • The authors argue against prior agglomerative distillation approaches that scale down directly from multiple teachers, and instead show that scaling up to a large proxy teacher first and then scaling down from that single teacher improves results.
  • Experiments indicate EUPE matches or exceeds the performance of individual domain-expert encoders of similar size across diverse task domains, and also outperforms earlier agglomerative encoder methods.
  • The authors state they will release the full EUPE model family and accompanying code to support further research.

Abstract

Running AI models on smart edge devices can unlock versatile user experiences, but presents challenges due to limited compute and the need to handle multiple tasks simultaneously. This requires a vision encoder with small size but powerful and versatile representations. We present our method, Efficient Universal Perception Encoder (EUPE), which offers both inference efficiency and universally good representations for diverse downstream tasks. We achieve this by distilling from multiple domain-expert foundation vision encoders. Unlike previous agglomerative methods that directly scale down from multiple teachers to an efficient encoder, we demonstrate the importance of first scaling up to a large proxy teacher and then scaling down from this single teacher. Experiments show that EUPE achieves on-par or better performance than individual domain experts of the same size on diverse task domains and also outperforms previous agglomerative encoders. We will release the full family of EUPE models and the code to foster future research.