Discrete Tilt Matching

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Discrete Tilt Matching (DTM), a likelihood-free fine-tuning approach for masked diffusion large language models that avoids intractable sequence-level marginal likelihood objectives used by prior RL adaptations.
  • DTM reformulates dLLM fine-tuning as state-level matching of local unmasking posteriors under reward “tilting,” resulting in a weighted cross-entropy objective with an explicit minimizer.
  • The method also provides control variates designed to improve training stability and mitigate problems such as mode collapse.
  • Experiments on a synthetic maze-planning task show that DTM’s annealing schedule and control variates significantly affect stability, and large-scale fine-tuning of LLaDA-8B-Instruct improves performance on Sudoku and Countdown while staying competitive on MATH500 and GSM8K.

Abstract

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.