SCORE: Replacing Layer Stacking with Contractive Recurrent Depth

arXiv cs.LG / 3/12/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SCORE introduces a discrete recurrent depth by repeatedly applying a single shared block with an ODE-inspired contractive update ht+1 = (1 - dt) * ht + dt * F(ht), offering depth-by-iteration refinement without multiple independent layers.
Unlike Neural ODEs, SCORE uses a fixed number of discrete steps and standard backpropagation, avoiding solvers and adjoint methods.
The method reduces parameter count through shared weights and shows improved convergence speed across graph neural networks, multilayer perceptrons, and Transformer-based language models like nanoGPT.
Empirically, simple Euler integration provides the best trade-off between compute and performance, while higher-order integrators yield marginal gains at extra cost.
The results suggest contractive residual updates as a lightweight, effective alternative to classical stacking across diverse architectures.

Abstract

Residual connections are central to modern deep neural networks, enabling stable optimization and efficient information flow across depth. In this work, we propose SCORE (Skip-Connection ODE Recurrent Embedding), a discrete recurrent alternative to classical layer stacking. Instead of composing multiple independent layers, SCORE iteratively applies a single shared neural block using an ODE (Ordinary Differential Equation)-inspired contractive update: ht+1 = (1 - dt) * ht + dt * F(ht) This formulation can be interpreted as a depth-by-iteration refinement process, where the step size dt explicitly controls stability and update magnitude. Unlike continuous Neural ODE approaches, SCORE uses a fixed number of discrete iterations and standard backpropagation without requiring ODE solvers or adjoint methods. We evaluate SCORE across graph neural networks (ESOL molecular solubility), multilayer perceptrons, and Transformer-based language models (nanoGPT). Across architectures, SCORE generally improves convergence speed and often accelerates training. SCORE is reducing parameter count through shared weights. In practice, simple Euler integration provides the best trade-off between computational cost and performance, while higher-order integrators yield marginal gains at increased compute. These results suggest that controlled recurrent depth with contractive residual updates offers a lightweight and effective alternative to classical stacking in deep neural networks.