Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

arXiv cs.LG / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper argues that many discrete diffusion approaches based on continuous-time Markov chains (CTMCs) parameterize the reverse dynamics as one monolithic object, instead of matching CTMC’s core decomposition into jump timing and jump direction.
It proposes “Neural CTMC,” which learns the reverse process with two separate network heads: an exit rate for when to jump and a jump distribution for where to jump.
The authors show the training objective can be expressed so that, up to a θ-independent constant, it is determined by the exit-rate and jump-distribution terms, and that the KL divergence cleanly factorizes into a Poisson KL (timing) and a categorical KL (direction).
They prove the use of a tractable conditional surrogate preserves the gradients and minimizers of the marginal reverse-process objective under standard assumptions, and extend the theory to masked and GIDD-style noise schedules.
Experiments claim their “pure-uniform” method beats mask-based methods on OpenWebText, and the authors release pretrained weights on Hugging Face for reproducibility.

Abstract

Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix as a single object -- via concrete scores, clean-data predictions (

x_0

-parameterization), or denoising distributions -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. Since a CTMC is fundamentally a Poisson process fully determined by these two quantities, decomposing along this structure is closer to first principles and naturally leads to our formulation. We propose \textbf{Neural CTMC}, which separately parameterizes the reverse process through an \emph{exit rate} (when to jump) and a \emph{jump distribution} (where to jump) using two dedicated network heads. We show that the evidence lower bound (ELBO) differs from a path-space KL divergence between the true and learned reverse processes by a

\theta

-independent constant, so that the training objective is fully governed by the exit rate and jump distribution we parameterize. Moreover, this KL factorizes into a Poisson KL for timing and a categorical KL for direction. We further show that the tractable conditional surrogate preserves the gradients and minimizers of the corresponding marginal reverse-process objective under standard regularity assumptions. Our theoretical framework also covers masked and GIDD-style noise schedules. Empirically, while the uniform forward process has been explored in prior work, our model, to our best of the knowledge, is the first pure-uniform method to outperform mask-based methods on the OpenWebText dataset.To facilitate reproducibility, we release our pretrained weights at https://huggingface.co/Jiangxy1117/Neural-CTMC.

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

Key Points

Abstract

Related Articles

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer