UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces UniCon, a unified framework for faster contrastive alignment in multimodal models by leveraging kernel-based formulations.
  • UniCon defines a contrastive similarity weight matrix S(γ) that enables closed-form global solutions, replacing minibatch backpropagation with exact updates.
  • The approach is presented through reproducing kernel Hilbert spaces (RKHS), showing how contrastive alignment relates to spectral methods.
  • Experiments across synthetic, unimodal, multimodal, and zero-shot settings indicate UniCon delivers significant efficiency improvements while maintaining generality and strong empirical performance.

Abstract

Contrastive objectives power state-of-the-art multimodal models, but their training remains slow, relying on long stochastic optimization. We propose a Unified Framework for Efficient Contrastive Alignment via Kernels (UniCon), which spans linear and nonlinear encoders as well as one-to-one and many-to-many alignments. At its core, UniCon introduces the contrastive similarity weight matrix S(\gamma), which enables closed-form global solutions that provably replace minibatch back-propagation with exact updates. Through the lens of reproducing kernel Hilbert spaces (RKHS), UniCon provides a kernelized perspective that unifies contrastive alignment and reveals its connection to spectral methods. To validate the theory, we conduct experiments on synthetic, unimodal, multimodal, and zero-shot tasks, demonstrating that UniCon achieves substantial efficiency gains while preserving generality and strong empirical performance.