Discrete Tilt Matching
arXiv cs.LG / 4/22/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Discrete Tilt Matching (DTM), a likelihood-free fine-tuning approach for masked diffusion large language models that avoids intractable sequence-level marginal likelihood objectives used by prior RL adaptations.
- DTM reformulates dLLM fine-tuning as state-level matching of local unmasking posteriors under reward “tilting,” resulting in a weighted cross-entropy objective with an explicit minimizer.
- The method also provides control variates designed to improve training stability and mitigate problems such as mode collapse.
- Experiments on a synthetic maze-planning task show that DTM’s annealing schedule and control variates significantly affect stability, and large-scale fine-tuning of LLaDA-8B-Instruct improves performance on Sudoku and Countdown while staying competitive on MATH500 and GSM8K.


