The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

arXiv cs.AI / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes the “reversal curse,” where autoregressive LMs fail to recover facts when the order is reversed (e.g., learning A>B but not B<A).
It reports that bidirectional supervision objectives—such as bidirectional attention or masking-based reconstruction for decoder-only models—can improve reversal accuracy, and it extends evaluation to include a standard MLM baseline.
Across four reversal benchmarks, the authors compare how MLM and decoder-only masking-based training mitigate the reversal curse and show that success depends on having training signals that explicitly make the source entity a prediction target.
Their mechanistic study suggests the gains do not necessarily come from a single, direction-agnostic latent representation; instead, probing indicates forward and reverse directions may be stored as distinct entries with different indexing geometry for MLM vs decoder-only masking.
The work cautions that objective-level changes can improve reversal behavior without guaranteeing the kind of “latent generalization” that would imply one unified concept of a fact.

Abstract

The reversal curse describes a failure of autoregressive language models to retrieve a fact in reverse order (e.g., training on ``

A > B

'' but failing on ``

B < A

''). Recent work shows that objectives with bidirectional supervision (e.g., bidirectional attention or masking-based reconstruction for decoder-only models) can mitigate the reversal curse. We extend this evaluation to include a vanilla masked language modeling (MLM) objective and compare it to decoder-only masking-based training across four reversal benchmarks and then provide a minimal mechanistic study of \emph{how} these objectives succeed. We show that reversal accuracy requires training signal that explicitly makes the source entity a prediction target, and we find little evidence that success corresponds to a single direction-agnostic representation of a fact. Instead, representation distances and linear probes are consistent with storing forward and reverse directions as distinct entries, with different indexing geometry for MLM versus decoder-only masking-based training. Our results caution that objective-level ``fixes'' can improve reversal behavior without necessarily inducing the kind of latent generalization one might expect from a unified concept.

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Google isn’t an AI-first company despite Gemini being great

Reddit r/artificial

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

Dev.to

The Illusion of Latent Generalization: Bi-directionality and the Reversal Curse

Key Points

Abstract

Related Articles

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Context Windows Are Getting Absurd — And That's a Good Thing

Google isn’t an AI-first company despite Gemini being great

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer