T$^\star$: Progressive Block Scaling for Masked Diffusion Language Models Through Trajectory Aware Reinforcement Learning

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces T$^\star$, a TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs).
  • It starts from an AR-initialized small-block MDM and transitions smoothly to larger blocks to increase parallelism during decoding while keeping math reasoning performance largely intact.
  • The authors report that higher block sizes can be used with minimal degradation on math reasoning benchmarks, suggesting a practical path to faster inference.
  • The study also indicates T$^\star$ may converge toward an alternative decoding schedule that can deliver comparable performance.

Abstract

We present T^\star, a simple TraceRL-based training curriculum for progressive block-size scaling in masked diffusion language models (MDMs). Starting from an AR-initialized small-block MDM, T^\star transitions smoothly to larger blocks, enabling higher-parallelism decoding with minimal performance degradation on math reasoning benchmarks. Moreover, further analysis suggests that T^\star may actually converge to an alternative decoding schedule that achieves comparable performance.