Discrete Tilt Matching

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Discrete Tilt Matching (DTM), a likelihood-free fine-tuning approach for masked diffusion large language models that avoids intractable sequence-level marginal likelihood objectives used by prior RL adaptations.
DTM reformulates dLLM fine-tuning as state-level matching of local unmasking posteriors under reward “tilting,” resulting in a weighted cross-entropy objective with an explicit minimizer.
The method also provides control variates designed to improve training stability and mitigate problems such as mode collapse.
Experiments on a synthetic maze-planning task show that DTM’s annealing schedule and control variates significantly affect stability, and large-scale fine-tuning of LLaDA-8B-Instruct improves performance on Sudoku and Countdown while staying competitive on MATH500 and GSM8K.

Abstract

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

Autoencoders and Representation Learning in Vision

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Context Bloat in AI Agents

Dev.to

We open sourced the AI dev team that builds our product

Dev.to

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

Reddit r/LocalLLaMA

Discrete Tilt Matching

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Context Bloat in AI Agents

We open sourced the AI dev team that builds our product

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer