When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

arXiv cs.LG / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how continual learning must balance plasticity (new learning) and stability (preserving old representations), focusing on when architectural structure actually helps or hurts.
  • By varying network architecture (modular task-partitioned recurrent network vs single-module baseline), task similarity (low/medium/high), and weight-initialization scale (which changes effective representational dimensionality), the authors identify distinct learning regimes.
  • The results show that architectural differences matter little in high-dimensional regimes, where representations can flexibly support multiple tasks with minimal interference.
  • In lower-dimensional regimes, however, structural separation becomes decisive, producing a graded representational geometry: aligned subspaces for similar tasks, partial orthogonalization for moderately dissimilar tasks, and stronger separation for dissimilar tasks.
  • The authors conclude that representational dimensionality is a key organizing factor that determines when modular structure becomes functionally relevant in continual learning design.

Abstract

To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but may also induce interference when new learning disrupts existing representations. However, it remains unclear when and why structural separation influences this trade-off. In this study, we examine how network architecture, task similarity, and representational dimensionality jointly shape learning in a sequential task paradigm inspired by transfer-interference studies. We compare a task-partitioned modular recurrent network with a single-module baseline by systematically varying task similarity (low, medium, high) and the scale of weight initialization, which induces different learning regimes that we empirically characterize through the effective dimensionality of the learned representations. We find that architecture has minimal impact in high-dimensional regimes where representations are sufficiently unconstrained to accommodate multiple tasks without strong interference. In contrast, in lower-dimensional (rich) regimes, architectural separation is decisive: modular networks exhibit graded alignment of task-specific subspaces with overlap for similar tasks, partial orthogonalization for moderately dissimilar tasks, and stronger separation for dissimilar tasks. This graded geometry is absent in the single network baseline. Our findings suggest that representational dimensionality acts as a key organizing variable governing when structural separation becomes functionally relevant, and highlight adaptive geometry as a central principle for designing continual learning systems.