Orthogonal Quadratic Complements for Vision Transformer Feed-Forward Networks

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Orthogonal Quadratic Complements (OQC), a new feed-forward design for Vision Transformers that adds a low-rank quadratic auxiliary branch while explicitly projecting it onto the orthogonal complement of the main branch to avoid redundant information.
  • It studies efficient variants including OQC-LR (low-rank realization) and gated extensions (OQC-static and OQC-dynamic), aiming to separate the benefits of stronger second-order interactions from redundancy/increased capacity.
  • On a parameter-matched Deep-ViT and CIFAR-100 setup, full OQC improves an AFBO baseline from 64.25±0.22 to 65.59±0.22, while OQC-LR attains 65.52±0.25 with a better speed–accuracy tradeoff.
  • On TinyImageNet, the gated OQC-dynamic variant reaches 51.88±0.32, outperforming the baseline (50.45±0.21) by 1.43 points and beating ungated alternatives.
  • Mechanistic analysis indicates near-zero overlap between post-projection auxiliary and main representations, alongside improved representation geometry and class separation, with consistent generalization across both datasets.

Abstract

Recent bilinear feed-forward replacements for vision transformers can substantially improve accuracy, but they often conflate two effects: stronger second-order interactions and increased redundancy relative to the main branch. We study a complementary design principle in which auxiliary quadratic features contribute only information not already captured by the dominant hidden representation. To this end, we propose Orthogonal Quadratic Complements (OQC), which construct a low-rank quadratic auxiliary branch and explicitly project it onto the orthogonal complement of the main branch before injection. We further study an efficient low-rank realization (OQC-LR) and gated extensions (OQC-static and OQC-dynamic). Under a parameter-matched Deep-ViT and CIFAR-100 protocol with a fixed penultimate residual readout, full OQC improves an AFBO baseline from 64.25 +/- 0.22 to 65.59 +/- 0.22, while OQC-LR reaches 65.52 +/- 0.25 with a substantially better speed-accuracy tradeoff. On TinyImageNet, the gated extension OQC-dynamic achieves 51.88 +/- 0.32, improving the baseline (50.45 +/- 0.21) by 1.43 points and outperforming all ungated variants. Mechanism analyses show near-zero post-projection auxiliary-main overlap together with improved representation geometry and class separation. The full family, including both ungated and gated variants, generalizes consistently across both datasets.