Thermodynamic Diffusion Inference with Minimal Digital Conditioning

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper establishes an equivalence between diffusion-model inference and overdamped Langevin dynamics, suggesting that a physical system encoding the score function could generate outputs via thermodynamics without digital arithmetic during inference.
  • It identifies two key blockers for production-scale realization—representing U-Net-style non-local skip connections with analog substrates and providing enough input conditioning signal through coupling constants.
  • The authors propose “hierarchical bilinear coupling” to represent U-Net skip connections using low-rank, rank-k inter-module interactions, reducing required physical connections from O(D^2) to O(Dk).
  • They introduce a “minimal digital interface” with a compact 4D bottleneck encoder plus a small 16-unit transfer network (2,560 parameters total) to overcome the conditioning barrier.
  • Experiments using activations from a trained denoising U-Net show high fidelity (decoder cosine similarity 0.9906 vs. 1.0000 oracle) and retain large projected energy savings (~10^7× over GPU inference), marking the first production-scale demonstration of trained-weight thermodynamic diffusion inference.

Abstract

Diffusion-model inference and overdamped Langevin dynamics are formally identical. A physical substrate that encodes the score function therefore equilibrates to the correct output by thermodynamics alone, requiring no digital arithmetic during inference and potentially achieving a 10{,}000\times reduction in energy relative to a GPU. Two fundamental barriers have until now prevented this equivalence from being realized at production scale: non-local skip connections, which locally coupled analog substrates cannot represent, and input conditioning, in which the coupling constants carry roughly 2{,}600\times too little signal to anchor the system to a specific input. We resolve both obstacles. \emph{Hierarchical bilinear coupling} encodes U-Net skip connections as rank-k inter-module interactions derived directly from the singular structure of the encoder and decoder Gram matrices, requiring only O(Dk) physical connections instead of O(D^2). A \emph{minimal digital interface} -- a 4-dimensional bottleneck encoder together with a 16-unit transfer network, totalling \textbf{2,560 parameters} -- overcomes the conditioning barrier. When evaluated on activations drawn from a trained denoising U-Net, the complete system attains a decoder cosine similarity of \textbf{0.9906} against an oracle upper bound of 1.0000, while preserving theoretical net energy savings of approximately 10^7\times over GPU inference. These results constitute the first demonstration of trained-weight, production-scale thermodynamic diffusion inference.