Surrogate Functionals for Machine-Learned Orbital-Free Density Functional Theory

arXiv cs.LG / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “surrogate functionals,” machine-learned energy functionals for orbital-free DFT (OF-DFT) that are trained to reproduce the correct ground-state density after applying a fixed density-optimization procedure.
  • It argues that training is simplified because it only requires ground-state densities, avoiding the need for energy labels or gradients away from the ground state.
  • The authors introduce a gradient-descent-improvement loss designed to guarantee exponential convergence of the optimized density toward the ground state.
  • They further add adaptive sampling to focus learning on the optimization trajectories encountered during inference, improving learning efficiency.
  • Experiments on QM9 and QMugs show competitive or better density accuracy than prior fully supervised OF-DFT approaches, while removing the expensive O(N^3) orthonormalization step and improving runtime scaling for larger systems.

Abstract

We introduce surrogate functionals: machine-learned energy functionals for orbital-free density functional theory (OF-DFT) which are defined not by universal fidelity to a physical reference, but merely by the requirement that density optimization with a fixed procedure yields the true ground-state density. Helpfully, training surrogate functionals requires only ground-state densities, no energies or gradients away from the ground state. We here propose a gradient-descent-improvement loss that guarantees exponential convergence of the density to the ground state, and combine it with an adaptive sampling scheme that concentrates learning around the optimization trajectories actually visited during inference. On the QM9 and QMugs benchmarks, surrogate functionals achieve density errors competitive with or improving upon the state of the art for fully supervised machine-learned OF-DFT, while eliminating the need for the O(N^3) orthononormalization step required by prior work, yielding improved runtime scaling for larger systems.