Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a “convex paradigm” for deep learning in which sufficiently accurate approximation by small binary circuits leads to convex structure in a Harder than Monte Carlo (HTMC) regime when the exponent >2.
  • It defines HTMC norms for functions based on the circuit-size scaling needed for -approximation, connecting computational circuit complexity to functional geometry.
  • In parallel, the authors introduce a ResNet norm derived from a weighted 1 norm over ResNet parameters, yielding a corresponding norm on functions.
  • They establish an almost matching “sandwich bound” relating HTMC norms and ResNet norms, implying that minimizing the ResNet norm aligns with finding near-minimal-size circuits (within a power-of-2 factor).
  • Overall, the work frames ResNets as a computational model for real functions that is especially well-suited to the HTMC regime where convexity emerges.

Abstract

This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function f that can be \epsilon-approximated with a binary circuit of size at most c\epsilon^{-\gamma} becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when \gamma>2, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted \ell_1 norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.