Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

arXiv stat.ML / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Implicit models generate outputs by iterating a shared operator to a fixed point, effectively behaving like an infinite-depth, weight-tied network trained with constant memory.
  • The paper addresses why implicit models can match or exceed explicit models by increasing test-time compute, providing a strict mathematical characterization of their expressive power.
  • It proves that for a broad class of implicit operators, expressive power increases systematically with the number of test-time iterations, allowing the model to approach a richer function class.
  • Experiments across image reconstruction, scientific computing, operations research, and LLM reasoning show that more iterations improve mapping complexity and solution quality while stabilizing performance.

Abstract

Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed the accuracy of larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across four domains: image reconstruction, scientific computing, operations research, and LLM reasoning, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.