Reasoning as Energy Minimization over Structured Latent Trajectories

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

提案手法EBRM（Energy-Based Reasoning via Structured Latent Planning）は、推論を「学習されたエネルギー関数」を用いた潜在軌跡z_{1:T}の勾配ベース最適化として定式化し、各ステップでの整合性・遷移整合・軌跡の滑らかさをエネルギー分解する。
学習では教師ありのエンコーダ/デコーダ学習に加え、ハードネガティブによる対照的なエネルギー整形を行い、推論はzへの勾配降下やLangevin dynamicsでエネルギーを下げてからz_Tを復号する。
CNF論理充足の設定では精度が約95%から約56%へ大きく低下する致命的な失敗モードが報告され、理由はデコーダが学習時のエンコーダ出力に基づく分布で学習されるのに対し、プランナのz_Tが未見の潜在領域へドリフトして分布ミスマッチが生じることにある。
対策として、デコーダのデュアルパス学習と潜在アンカー（latent anchoring）を提案し、その効果を6項目に分けたアブレーション（構成要素、軌跡長、プランナダイナミクス、初期化、訓練分布、アンカー重み）で検証する。
合成3タスクではグラフ/論理でエネルギーが単調に低下し構造化された潜在軌跡が得られる一方、算術タスクではエネルギーがほぼフラット（r=0.073）で、算術では負の結果も示している。

Abstract

Single-shot neural decoders commit to answers without iterative refinement, while chain-of-thought methods introduce discrete intermediate steps but lack a scalar measure of reasoning progress. We propose Energy-Based Reasoning via Structured Latent Planning (EBRM), which models reasoning as gradient-based optimization of a multi-step latent trajectory

z_{1:T}

under a learned energy function

E(h_x, z)

. The energy decomposes into per-step compatibility, transition consistency, and trajectory smoothness terms. Training combines supervised encoder-decoder learning with contrastive energy shaping using hard negatives, while inference performs gradient descent or Langevin dynamics over

z

and decodes from

z_T

. We identify a critical failure mode: on CNF logic satisfaction, latent planning reduces accuracy from

\approx 95\%

\approx 56\%

. This degradation arises from a distribution mismatch, where the decoder is trained on encoder outputs

h_x

but evaluated on planner outputs

z_T

that drift into unseen latent regions. We analyze this behavior through per-step decoding, latent drift tracking, and gradient decomposition. To address it, we propose dual-path decoder training and latent anchoring. We further introduce a six-part ablation protocol covering component contributions, trajectory length, planner dynamics, initialization, decoder training distribution, and anchor weight. Experiments on three synthetic tasks show that energy decreases monotonically and induces structured latent trajectories on graph and logic tasks, while remaining flat on arithmetic (