FlowAdam: Implicit Regularization via Geometry-Aware Soft Momentum Injection

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Adamのような適応的モーメント手法は対角(座標ごと)で前処理するため、回転・密結合のあるパラメータ空間で最適化が難しくなるという課題を指摘しています。
  • FlowAdamは、EMA統計が「難しい地形」を検知したときに、勾配フローをODE(連続時間の統合)として切り替え、クリップ付きの統合で挙動を安定化させます。
  • 中核のSoft Momentum Injectionは、モード移行時にODE由来の速度とAdamのモーメントをブレンドし、単純なハイブリッドで起きる学習崩壊を防ぐことを狙っています。
  • 結果として、結合最適化ベンチマークでHeld-out誤差を10-22%(低ランクの行列/テンソル回復)や約6%(Jesterの協調フィルタリング)改善し、LionやAdaBeliefを上回りつつ、良条件な課題ではAdamと同等性能です。
  • アブレーションでは、Soft injectionが必須であり、Hardな置換は精度を100%から82.5%へ大きく低下させることを示しています。

Abstract

Adaptive moment methods such as Adam use a diagonal, coordinate-wise preconditioner based on exponential moving averages of squared gradients. This diagonal scaling is coordinate-system dependent and can struggle with dense or rotated parameter couplings, including those in matrix factorization, tensor decomposition, and graph neural networks, because it treats each parameter independently. We introduce FlowAdam, a hybrid optimizer that augments Adam with continuous gradient-flow integration via an ordinary differential equation (ODE). When EMA-based statistics detect landscape difficulty, FlowAdam switches to clipped ODE integration. Our central contribution is Soft Momentum Injection, which blends ODE velocity with Adam's momentum during mode transitions. This prevents the training collapse observed with naive hybrid approaches. Across coupled optimization benchmarks, the ODE integration provides implicit regularization, reducing held-out error by 10-22% on low-rank matrix/tensor recovery and 6% on Jester (real-world collaborative filtering), also surpassing tuned Lion and AdaBelief, while matching Adam on well-conditioned workloads (CIFAR-10). MovieLens-100K confirms benefits arise specifically from coupled parameter interactions rather than bias estimation. Ablation studies show that soft injection is essential, as hard replacement reduces accuracy from 100% to 82.5%.