Symmetric Equilibrium Propagation for Thermodynamic Diffusion Training

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper shows that the reverse process of score-based diffusion models can be seen as overdamped Langevin dynamics in a time-dependent energy landscape, linking diffusion sampling to physical thermodynamic dynamics.
  • Building on prior hardware work, it demonstrates that a bilinearly-coupled analog substrate can also close the training loop without routing gradients through an external digital accelerator.
  • It proves that Equilibrium Propagation (EqProp) applied to the bilinear energy provides an unbiased score-matching gradient estimator in the zero-nudge limit, and derives a bias bound for finite nudging governed by substrate stiffness, local curvature, and loss-gradient signal norm.
  • The authors introduce symmetric nudging, improving the leading bias from O(β) to O(β^2) with little extra cost, and argue it is crucial under realistic finite-relaxation budgets to avoid anti-correlated gradients.
  • End-to-end physical accounting suggests a claimed 10^3–10^4× energy advantage per training step versus a matched GPU baseline, and positions symmetric bilinear EqProp as a first local readout-only training rule that preserves low-rank coupling for scalable thermodynamic diffusion models.

Abstract

The reverse process in score-based diffusion models is formally equivalent to overdamped Langevin dynamics in a time-dependent energy landscape. In our prior work we showed that a bilinearly-coupled analog substrate can physically realize this dynamics at a projected three-to-four orders of magnitude energy advantage over digital inference by replacing dense skip connections with low-rank inter-module couplings. Whether the \emph{training} loop can be closed on the same substrate -- without routing gradients through an external digital accelerator -- has remained open. We resolve this affirmatively: Equilibrium Propagation applied directly to the bilinear energy yields an unbiased estimator of the denoising score-matching gradient in the zero-nudge limit. For finite nudging we derive a sharp bias bound controlled solely by substrate stiffness, local curvature, and the norm of the loss-gradient signal, with a bilinear-specific corollary showing that one dominant bias term vanishes identically for coupling-parameter updates. Symmetric nudging further upgrades the leading bias from \mathcal{O}(\beta) to \mathcal{O}(\beta^2) at negligible extra cost. Under realistic finite-relaxation budgets this upgrade is essential, as one-sided EqProp produces anti-correlated gradients while symmetric EqProp yields well-aligned updates. Bias-variance analysis determines the optimal operating point, and end-to-end physical-unit accounting projects a 10^3-10^4\times energy advantage per training step over a matched GPU baseline. Symmetric bilinear EqProp is the first local, readout-only training rule that preserves the low-rank coupling enabling scalable thermodynamic diffusion models.