Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

arXiv stat.ML / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

論文は、ディープラーニングにおけるフィーチャ学習（Feature Learning）のメカニズム解釈や暗黙のバイアスを扱う理論が、高次元の非線形方程式に依存して解析計算が重くなりがちな「ディテールの呪い」を問題としている。
そこで著者らは、厳密解ではなく「スケール解析」に基づいて、どのデータ量・ネットワーク幅で特定のパターンのFLが現れるかを予測するヒューリスティック手法を提案し、既存結果のスケーリング指数を再現できると述べている。
さらに、3層の非線形ネットワークやattention headといった複雑なトイアーキテクチャに対して新たな予測を提示し、第一原理系の理論の適用範囲を広げることを目指している。
「サンプル複雑度」や「FLの出現条件」を、計算コストの高い数値解法に頼らずに見通しやすくする点が主眼である。

Abstract

Two pressing topics in the theory of deep learning are the interpretation of feature learning (FL) mechanisms and the determination of implicit bias of networks in the rich regime. Current theories of rich FL often appear in the form of high-dimensional non-linear equations, which require computationally intensive numerical solutions. Given the many details that go into defining a deep learning problem, this analytical complexity is a significant and often unavoidable challenge. Here, we propose a powerful heuristic route for predicting the data and width scales at which various patterns of FL emerge. This form of scale analysis is considerably simpler than such exact theories and reproduces the scaling exponents of various known results. In addition, we make novel predictions on complex toy architectures, such as three-layer non-linear networks and attention heads, thus extending the scope of first-principle theories of deep learning.