go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices

arXiv cs.LG / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces an exact, efficient parameterization of the Birkhoff polytope (doubly stochastic matrices) using generalized orthostochastic matrices, avoiding factorial scaling while retaining full expressivity.
The proposed parameterization scales as O(d^3) and uses a single hyperparameter s to continuously interpolate between an efficient boundary solution and the fully expressive set.
It is integrated into Manifold-Constrained Hyper-Connections, yielding go-$m$HC, which composes with Kronecker-factorized methods to substantially recover lost expressivity at similar FLOP cost.
Spectral analysis and synthetic experiments show go-$m$HC better covers the Birkhoff polytope than Kronecker baselines, reaching the minimum theoretical loss and converging up to 10× faster.
The authors validate the approach in a 30M-parameter GPT-style language model, arguing it enables scaling residual-stream mixing capacity by treating the stream dimension d as an additional capacity axis.

Abstract

Doubly stochastic matrices enable learned mixing across residual streams, but parameterizing the set of doubly stochastic matrices (the Birkhoff polytope) exactly and efficiently remains an open challenge. Existing exact methods scale factorially with the number of streams (

d

), while Kronecker-factorized approaches are efficient but expressivity-limited. We introduce a novel exact parameterization grounded in the theory of generalized orthostochastic matrices, which scales as

\mathcal{O}(d^3)

and exposes a single hyperparameter

s

which continuously interpolates between a computationally efficient boundary and the fully expressive Birkhoff polytope. Building on Manifold-Constrained Hyper-Connections (

m

HC), a framework for learned dynamic layer connectivity, we instantiate this parameterization in go-

m

HC. Our method composes naturally with Kronecker-factorized methods, substantially recovering expressivity at similar FLOP costs. Spectral analysis indicates that go-

m

HC fills the Birkhoff polytope far more completely than Kronecker-factorized baselines. On synthetic stream-mixing tasks, go-

m

HC achieves the minimum theoretical loss while converging up to

10\times

faster. We validate our approach in a 30M parameter GPT-style language model. The expressivity, efficiency, and exactness of go-

m

HC offer a practical avenue for scaling

d

as a new dimension of model capacity.

Black Hat Asia

AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening

Reddit r/artificial

go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices

Key Points

Abstract

Related Articles

Black Hat Asia

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

Portable eye scanner powered by AI expands access to low-cost community screening

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer