Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

arXiv stat.ML / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates “grokking,” where models reach high training accuracy but only much later generalize to unseen test points, focusing on algebraic tasks beyond the original modular arithmetic setting.
  • Using the Recursive Feature Machine (RFM) and Average Gradient Outer Product (AGOP), the authors analyze feature learning kernels and show that generalization occurs only when a specific symmetry in the training set is broken.
  • They provide empirical evidence that RFM generalizes by recovering the invariance group action present in the data, connecting learned representations to underlying data symmetries.
  • The study concludes that learned feature matrices encode elements tied to the invariance group, offering an explanation for why generalization depends on the presence/absence of symmetry.

Abstract

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic (Power et al., 2022). We study grokking on algebraic tasks in a class of feature learning kernels via the Recursive Feature Machine (RFM) algorithm (Radhakrishnan et al., 2024), which iteratively updates feature matrices through the Average Gradient Outer Product (AGOP) of an estimator in order to learn task-relevant features. Our main experimental finding is that generalization occurs only when a certain symmetry in the training set is broken. Furthermore, we empirically show that RFM generalizes by recovering the underlying invariance group action inherent in the data. We find that the learned feature matrices encode specific elements of the invariance group, explaining the dependence of generalization on symmetry.