UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces UniCon, a unified framework for faster contrastive alignment in multimodal models by leveraging kernel-based formulations.
UniCon defines a contrastive similarity weight matrix S(γ) that enables closed-form global solutions, replacing minibatch backpropagation with exact updates.
The approach is presented through reproducing kernel Hilbert spaces (RKHS), showing how contrastive alignment relates to spectral methods.
Experiments across synthetic, unimodal, multimodal, and zero-shot settings indicate UniCon delivers significant efficiency improvements while maintaining generality and strong empirical performance.

Abstract

Contrastive objectives power state-of-the-art multimodal models, but their training remains slow, relying on long stochastic optimization. We propose a Unified Framework for Efficient Contrastive Alignment via Kernels (UniCon), which spans linear and nonlinear encoders as well as one-to-one and many-to-many alignments. At its core, UniCon introduces the contrastive similarity weight matrix

S(\gamma)

, which enables closed-form global solutions that provably replace minibatch back-propagation with exact updates. Through the lens of reproducing kernel Hilbert spaces (RKHS), UniCon provides a kernelized perspective that unifies contrastive alignment and reveals its connection to spectral methods. To validate the theory, we conduct experiments on synthetic, unimodal, multimodal, and zero-shot tasks, demonstrating that UniCon achieves substantial efficiency gains while preserving generality and strong empirical performance.