Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models
arXiv cs.CL / 4/14/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies non-autoregressive decoding in diffusion-based language models by analyzing inference dynamics over the diffusion time (temporal) axis to understand why decoding can fail on reasoning/planning tasks.
- It identifies a failure mode driven by “proximity bias,” where denoising tends to focus on spatially adjacent tokens, causing spatial error propagation and making the generation trajectory overly dependent on the initial unmasking position.
- To mitigate this, the authors propose a minimal-intervention method that improves early token selection using a lightweight planner and end-of-sequence temperature annealing.
- Experiments on multiple reasoning and planning benchmarks show substantial improvements over existing heuristic baselines while adding little to no computational overhead.




