Abstract
Neural networks systematically fail at compositional generalization -- producing correct outputs for novel combinations of known parts. We show that this failure is architectural: compositional generalization is equivalent to functoriality of the decoder, and this perspective yields both guarantees and impossibility results. We compile Higher Inductive Type (HIT) specifications into neural architectures via a monoidal functor from the path groupoid of a target space to a category of parametric maps: path constructors become generator networks, composition becomes structural concatenation, and 2-cells witnessing group relations become learned natural transformations. We prove that decoders assembled by structural concatenation of independently generated segments are strict monoidal functors (compositional by construction), while softmax self-attention is not functorial for any non-trivial compositional task. Both results are formalized in Cubical Agda. Experiments on three spaces validate the full hierarchy: on the torus (\mathbb{Z}^2), functorial decoders outperform non-functorial ones by 2-2.7x; on S^1 \vee S^1 (F_2), the type-A/B gap widens to 5.5-10x; on the Klein bottle (\mathbb{Z} \rtimes \mathbb{Z}), a learned 2-cell closes a 46% error gap on words exercising the group relation.