Adaptive Norm-Based Regularization for Neural Networks

arXiv stat.ML / 5/4/2026

📰 NewsModels & Research

Key Points

  • The paper studies norm-based regularization for neural networks, comparing existing penalty methods and proposing two new strategies based on extensions of ridge- and lasso-type regularizers.
  • One approach adapts weight decay by incorporating the input feature covariance into an ℓ2 (ridge-type) penalty, so regularization reflects dependencies among features.
  • The second approach combines an ℓ1 sparsity term with covariance-aware ℓ2 regularization to encourage both sparse weights and structure-informed parameter learning.
  • Experiments using Monte Carlo simulations and two real-world applications (building cooling-load prediction and leukemia cell-type classification from gene expression) show better predictive performance on unseen data and stronger complexity control than standard norm-based penalties, especially with correlated and high-dimensional features.

Abstract

In this paper, we study norm-based regularization methods for neural networks. We compare existing penalization approaches and introduce two regularization strategies that extend classical ridge- and lasso-type penalties to neural network models. The first strategy modifies weight decay by incorporating the covariance structure of the input features into a ridge-type \ell_2 penalty, allowing regularization to account for feature dependence. The second combines an \ell_1 sparsity penalty with covariance-aware \ell_2 regularization, producing neural network weights that are both sparse and structurally informed. Monte Carlo simulations are used to evaluate these methods under different data-generating settings, followed by two real-data applications on building cooling-load prediction and leukemia cell-type classification from high-dimensional gene expression data. Across simulated and real-data examples, the proposed regularizers improve predictive performance on unseen data and provide more effective complexity control than standard norm-based penalties, particularly when features are correlated or high-dimensional.