Orthogonal Quadratic Complements for Vision Transformer Feed-Forward Networks

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Orthogonal Quadratic Complements (OQC), a new feed-forward design for Vision Transformers that adds a low-rank quadratic auxiliary branch while explicitly projecting it onto the orthogonal complement of the main branch to avoid redundant information.
It studies efficient variants including OQC-LR (low-rank realization) and gated extensions (OQC-static and OQC-dynamic), aiming to separate the benefits of stronger second-order interactions from redundancy/increased capacity.
On a parameter-matched Deep-ViT and CIFAR-100 setup, full OQC improves an AFBO baseline from 64.25±0.22 to 65.59±0.22, while OQC-LR attains 65.52±0.25 with a better speed–accuracy tradeoff.
On TinyImageNet, the gated OQC-dynamic variant reaches 51.88±0.32, outperforming the baseline (50.45±0.21) by 1.43 points and beating ungated alternatives.
Mechanistic analysis indicates near-zero overlap between post-projection auxiliary and main representations, alongside improved representation geometry and class separation, with consistent generalization across both datasets.

Abstract

Recent bilinear feed-forward replacements for vision transformers can substantially improve accuracy, but they often conflate two effects: stronger second-order interactions and increased redundancy relative to the main branch. We study a complementary design principle in which auxiliary quadratic features contribute only information not already captured by the dominant hidden representation. To this end, we propose Orthogonal Quadratic Complements (OQC), which construct a low-rank quadratic auxiliary branch and explicitly project it onto the orthogonal complement of the main branch before injection. We further study an efficient low-rank realization (OQC-LR) and gated extensions (OQC-static and OQC-dynamic). Under a parameter-matched Deep-ViT and CIFAR-100 protocol with a fixed penultimate residual readout, full OQC improves an AFBO baseline from 64.25 +/- 0.22 to 65.59 +/- 0.22, while OQC-LR reaches 65.52 +/- 0.25 with a substantially better speed-accuracy tradeoff. On TinyImageNet, the gated extension OQC-dynamic achieves 51.88 +/- 0.32, improving the baseline (50.45 +/- 0.21) by 1.43 points and outperforming all ungated variants. Mechanism analyses show near-zero post-projection auxiliary-main overlap together with improved representation geometry and class separation. The full family, including both ungated and gated variants, generalizes consistently across both datasets.

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

Orthogonal Quadratic Complements for Vision Transformer Feed-Forward Networks

Key Points

Abstract

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer