AI Navigate

Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors derive an upper-bound formula for the local learning coefficient at singular points in three-layer neural networks, advancing Bayesian asymptotics for singular learning models.
  • The formula functions as a counting rule under budget and demand-supply constraints and is applicable to a broad class of analytic activation functions, including swish and polynomial activations.
  • For one-dimensional input, the upper bound matches the known learning coefficient, partially resolving discrepancies from prior results.
  • The result offers a systematic perspective on how the network's weight parameters shape the learning coefficient across activation functions and architectures.

Abstract

Three-layer neural networks are known to form singular learning models, and their Bayesian asymptotic behavior is governed by the learning coefficient, or real log canonical threshold. Although this quantity has been clarified for regular models and for some special singular models, broadly applicable methods for evaluating it in neural networks remain limited. Recently, a formula for the local learning coefficient of semiregular models was proposed, yielding an upper bound on the learning coefficient. However, this formula applies only to nonsingular points in the set of realization parameters and cannot be used at singular points. In particular, for three-layer neural networks, the resulting upper bound has been shown to differ substantially from learning coefficient values already known in some cases. In this paper, we derive an upper-bound formula for the local learning coefficient at singular points in three-layer neural networks. This formula can be interpreted as a counting rule under budget constraints and demand-supply constraints, and is applicable to general analytic activation functions. In particular, it covers the swish function and polynomial functions, extending previous results to a wider class of activation functions. We further show that, when the input dimension is one, the upper bound obtained here coincides with the already known learning coefficient, thereby partially resolving the discrepancy above. Our result also provides a systematic perspective on how the weight parameters of three-layer neural networks affect the learning coefficient.