[R] Joint Embedding Variational Bayes (TMLR ’26)

Reddit r/MachineLearning / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The TMLR ’26 paper proposes adding operational variational semantics to joint-embedding architectures for non-contrastive representation learning using three coupled modeling choices.
It factorizes the embedding likelihood into directional and radial (norm) terms to model angular alignment and representation magnitude separately and avoid norm-direction pathological coupling.
It ties posterior uncertainty to the likelihood scale, so uncertainty both informs inference and directly shapes the embedding likelihood.
It uses a heavy-tailed Student-t likelihood instead of a Gaussian, improving stability and preventing catastrophic training failure near the Gaussian limit.
The approach learns anisotropic, feature-wise uncertainty and is evaluated on downstream out-of-distribution (OOD) detection tasks, including comparisons to VI-SimSiam, with accompanying code released.

Disclosure: first author.

The paper was just published in TMLR, and I figured it might be of interest to some people here. It is fairly dense mathematically, but straightforward conceptually: to add operational variational semantics to joint-embedding architectures for non-contrastive representation learning, we make three coupled choices:

Factorize embedding likelihood: the likelihood is split into directional and radial terms, so angular alignment and representation norm are modelled separately. The radial/norm term does not drive accuracy on its own, but the factorization avoids the norm-direction coupling that otherwise produces pathological solutions.
Anchor posterior/likelihood uncertainty: the posterior variance is tied to the likelihood scale, so uncertainty directly governs both inference and the embedding likelihood.
Use heavy-tailed likelihood: the likelihood uses a Student-t form rather than Gaussian. This matters empirically, since as the likelihood approaches the Gaussian limit, training becomes unstable and the model fails catastrophically.

These allow the model to learn anisotropic / feature-wise uncertainty, which is evaluated in a downstream OOD detection experiments, including against VI-SimSiam.

arXiv | OpenReview | Code

submitted by /u/ISwallow5Gum
[link] [comments]

Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

Microsoft Research Blog

langchain-fireworks==1.2.1

LangChain Releases

How PolySignals Works: Full Breakdown of Its AI Signal Engine

Dev.to

AI-Powered Prediction Market Signals: The Complete Polymarket Trading Guide for 2026

Dev.to

AI Agent Orchestration & Applied LLMs: Code Search, Workflow Optimization, Document Processing

Dev.to

[R] Joint Embedding Variational Bayes (TMLR ’26)

Key Points

Related Articles

Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

langchain-fireworks==1.2.1

How PolySignals Works: Full Breakdown of Its AI Signal Engine

AI-Powered Prediction Market Signals: The Complete Polymarket Trading Guide for 2026

AI Agent Orchestration & Applied LLMs: Code Search, Workflow Optimization, Document Processing

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer