Beyond the Birkhoff Polytope: Spectral-Sphere-Constrained Hyper-Connections
arXiv cs.LG / 2026/3/24
💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper introduces Hyper-Connections (HC), a generalization of residual connections that mixes features across multiple streams using residual matrices, but notes that unconstrained mixing can break the identity-mapping property and destabilize training.
- It reviews Manifold-Constrained Hyper-Connections (mHC) methods that restrict cross-stream mixing matrices to the Birkhoff polytope (doubly stochastic matrices) using Sinkhorn iterations or permutation-based parameterizations, and identifies three key drawbacks: identity degeneration, reduced expressivity from non-negativity, and parameterization inefficiencies.
- To address these limitations, the authors propose Spectral-Sphere-Constrained Hyper-Connections (sHC), which replaces the rigid polytope constraint with a spectral-norm sphere constraint, enabling negative entries for subtractive feature interactions.
- The proposed constraint is claimed to preserve training stability while avoiding both unstable Sinkhorn projections and factorial-scaling overhead from permutation-based parameterizations, yielding expressive non-degenerate residual matrices.
