T$^\star$: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces T$^\star$, a TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs).
It starts from an AR-initialized small-block MDM and transitions smoothly to larger blocks to increase parallelism during decoding while keeping math reasoning performance largely intact.
The authors report that higher block sizes can be used with minimal degradation on math reasoning benchmarks, suggesting a practical path to faster inference.
The study also indicates T$^\star$ may converge toward an alternative decoding schedule that can deliver comparable performance.

Abstract

We present T

^\star

, a simple TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small-block MDM, T

^\star

transitions smoothly to larger blocks, enabling higher-parallelism decoding with minimal performance degradation on math reasoning benchmarks. Moreover, further analysis suggests that T