Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that correct reasoning trajectories in diffusion large language models (dLLMs) correspond to stable attractors on the high-density manifold of the learned distribution, while incorrect paths drift off-manifold.
  • It introduces Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that estimates trajectory stability via a forward-masking and backward-reconstruction cycle.
  • Experiments show BMC works throughout the reasoning lifecycle: as a ground-truth-free discriminator for solution validity (Diagnosis), as a rejection-resampling signal to focus compute on harder tasks (Inference), and as a dense geometric reward for improving alignment beyond sparse supervision (Alignment).
  • Overall, the authors claim intrinsic geometric stability measured by BMC is a robust indicator of correctness for dLLMs.

Abstract

While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stable attractors on the high-density manifold of the learned distribution, whereas invalid paths exhibit off-manifold drift. To operationalize this, we introduce Bidirectional Manifold Consistency (BMC), a training-free, unsupervised metric that quantifies the stability of the generated sequence through a forward-masking and backward-reconstruction cycle. Empirically, we demonstrate BMC's versatility across the full reasoning lifecycle: (1) in Diagnosis, it serves as a robust discriminator of solution validity without ground truth answer; (2) in Inference, it enables rejection resampling to effectively concentrate computational resources on complex reasoning tasks; and (3) in Alignment, it functions as a dense geometric reward that transforms sparse outcome supervision into fine-grained guidance, empowering models to self-evolve beyond standard baselines. Our results establish intrinsic geometric stability as a robust indicator of correctness for dLLMs.