A ghost mechanism: An analytical model of abrupt learning in recurrent networks

arXiv stat.ML / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes the “ghost mechanism” as an analytical dynamical-systems model for abrupt learning in recurrent neural networks (RNNs), attributing sudden performance gains to transient slowdowns near remnants of a saddle-node bifurcation.
By reducing high-dimensional dynamics to a one-dimensional canonical form with a single scale parameter, the authors derive how abrupt learning behavior depends on learning rate and the timescale of the learned computation.
The study identifies a critical learning-rate threshold (scaling as an inverse power law with the computation timescale), beyond which learning breaks down via two interacting issues: vanishing gradients and oscillatory gradients near minima.
The authors show that these effects can trap training in “no-learning zones” where gradients vanish, leading the system to make high-confidence but incorrect predictions, and validate the theory in both low-rank and full-rank RNNs on working-memory tasks.
Two mitigation strategies are suggested: increasing trainable ranks to stabilize learning trajectories and reducing output confidence to prevent entrapment in no-learning zones.

Abstract

Abrupt learning is a common phenomenon in recurrent neural networks (RNNs) trained on working memory tasks. In such cases, the networks develop transient slow regions in state space that extend the effective timescales of computation. However, the mechanisms driving sudden performance improvements and their causal role remain unclear. To address this gap, we introduce the ghost mechanism, a process by which dynamical systems exhibit transient slowdown near the remnant of a saddle-node bifurcation. By reducing the high-dimensional dynamics near ghost points, we derive a one-dimensional canonical form that analytically captures learning as a process controlled by a single scale parameter. Using this model, we study a form of abrupt learning emerging from ghost points and identify a critical learning rate that scales as an inverse power law with the timescale of the learned computation. Beyond this rate, learning collapses through two interacting modes: (i) vanishing gradients and (ii) oscillatory gradients near minima. These features can lock the system into high-confidence but incorrect predictions when parameter updates trigger a no-learning zone, a region of parameter space where gradients vanish. We validate these predictions in low-rank RNNs, where ghost points precede abrupt transitions, and further demonstrate their generality in full-rank RNNs trained on canonical working memory tasks. Our theory offers two approaches to address these learning difficulties: increasing trainable ranks stabilizes learning trajectories, while reducing output confidence mitigates entrapment in no-learning zones. Overall, the ghost mechanism reveals how the computational demands of a task constrain the optimization landscape, demonstrating that well-known learning difficulties in RNNs partly arise from the dynamical systems they must learn to implement.