AI Navigate

Functorial Neural Architectures from Higher Inductive Types

arXiv cs.LG / 3/18/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The authors reframe compositional generalization as a problem of functoriality in decoders, deriving guarantees and impossibility results from a categorical perspective on architecture.
  • They implement Higher Inductive Type specifications as neural architectures via a monoidal functor from the path groupoid of a space to a category of parametric maps, turning path constructors into generator networks and composition into structural concatenation.
  • They prove that decoders built by structural concatenation are strict monoidal functors (thus compositional by construction), while softmax self-attention is not functorial for any non-trivial task.
  • Experiments on the torus, the wedge of circles (S^1 ∨ S^1), and the Klein bottle demonstrate substantial gains for functorial decoders, including 2-2.7x, 5.5-10x, and 46% error-gap closure on group-relations tasks.

Abstract

Neural networks systematically fail at compositional generalization -- producing correct outputs for novel combinations of known parts. We show that this failure is architectural: compositional generalization is equivalent to functoriality of the decoder, and this perspective yields both guarantees and impossibility results. We compile Higher Inductive Type (HIT) specifications into neural architectures via a monoidal functor from the path groupoid of a target space to a category of parametric maps: path constructors become generator networks, composition becomes structural concatenation, and 2-cells witnessing group relations become learned natural transformations. We prove that decoders assembled by structural concatenation of independently generated segments are strict monoidal functors (compositional by construction), while softmax self-attention is not functorial for any non-trivial compositional task. Both results are formalized in Cubical Agda. Experiments on three spaces validate the full hierarchy: on the torus (\mathbb{Z}^2), functorial decoders outperform non-functorial ones by 2-2.7x; on S^1 \vee S^1 (F_2), the type-A/B gap widens to 5.5-10x; on the Klein bottle (\mathbb{Z} \rtimes \mathbb{Z}), a learned 2-cell closes a 46% error gap on words exercising the group relation.