Representation Selection via Cross-Model Agreement using Canonical Correlation Analysis

arXiv cs.CV / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a training-free post-hoc method that applies canonical correlation analysis (CCA) to find linear projections for selecting and reducing redundant visual representation dimensions across two pretrained image encoders.
  • By exploiting cross-model agreement, the approach aims to retain shared semantic content while discarding overcomplete or model-specific dimensions more effectively than single-model dimensionality reduction like PCA.
  • Experiments across datasets such as ImageNet-1k, CIFAR-100, and MNIST show that representations can be reduced by over 75% dimensionality while improving downstream performance.
  • The method can also be used at fixed dimensionality to transfer or refine representations from larger or fine-tuned models, yielding accuracy improvements reported up to 12.6% over PCA and baseline projections.

Abstract

Modern vision pipelines increasingly rely on pretrained image encoders whose representations are reused across tasks and models, yet these representations are often overcomplete and model-specific. We propose a simple, training-free method to improve the efficiency of image representations via a post-hoc canonical correlation analysis (CCA) operator. By leveraging the shared structure between representations produced by two pre-trained image encoders, our method finds linear projections that serve as a principled form of representation selection and dimensionality reduction, retaining shared semantic content while discarding redundant dimensions. Unlike standard dimensionality reduction techniques such as PCA, which operate on a single embedding space, our approach leverages cross-model agreement to guide representation distillation and refinement. The technique allows representations to be reduced by more than 75% in dimensionality with improved downstream performance, or enhanced at fixed dimensionality via post-hoc representation transfer from larger or fine-tuned models. Empirical results on ImageNet-1k, CIFAR-100, MNIST, and additional benchmarks show consistent improvements over both baseline and PCA-projected representations, with accuracy gains of up to 12.6%.