Collective Kernel EFT for Pre-activation ResNets

arXiv cs.LG / 4/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper develops a collective-kernel effective field theory (EFT) for pre-activation ResNets that relies only on a kernel quantity G, using a G-only closure hierarchy to model how the kernel evolves across layers.
  • Using the exact conditional Gaussianity of residual increments, the authors derive an exact stochastic recursion for G and then apply systematic Gaussian approximations to obtain a continuous-depth ODE system for the mean kernel K0, a kernel covariance V4, and a 1/n correction term.
  • While the ODE for the mean kernel K0 stays accurate across depths, the residual for the covariance equation V4 grows to an O(1) error at finite time, attributed mainly to approximation errors in a G-only transport term.
  • The 1/n EFT correction K1,EFT is found to fail because the required source closure breaks down, showing a systematic mismatch even at initialization.
  • The results indicate that reducing the state space to G-only is limited, and the authors recommend extending the state space to include the sigma-kernel.

Abstract

In finite-width deep neural networks, the empirical kernel G evolves stochastically across layers. We develop a collective kernel effective field theory (EFT) for pre-activation ResNets based on a G-only closure hierarchy and diagnose its finite validity window. Exploiting the exact conditional Gaussianity of residual increments, we derive an exact stochastic recursion for G. Applying Gaussian approximations systematically yields a continuous-depth ODE system for the mean kernel K_0, the kernel covariance V_4, and the 1/n mean correction K_{1,\mathrm{EFT}}, which emerges diagrammatically as a one-loop tadpole correction. Numerically, K_0 remains accurate at all depths. However, the V_4 equation residual accumulates to an O(1) error at finite time, primarily driven by approximation errors in the G-only transport term. Furthermore, K_{1,\mathrm{EFT}} fails due to the breakdown of the source closure, which exhibits a systematic mismatch even at initialization. These findings highlight the limitations of G-only state-space reduction and suggest extending the state space to incorporate the sigma-kernel.