Beyond Attention: True Adaptive World Models via Spherical Kernel Operator

arXiv cs.LG / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that conventional world-modeling relies on projecting observations into latent spaces, which distorts manifold learning when data distributions shift.
It introduces Spherical Kernel Operator (SKO), a framework that replaces standard attention by projecting data onto a hypersphere and using Gegenbauer polynomials for direct function reconstruction.
SKO yields approximation error bounds that depend on the intrinsic manifold dimension q rather than the ambient dimension, addressing saturation issues common to positive operators like dot-product attention.
Empirically, SKO is reported to accelerate convergence and outperform standard attention baselines in autoregressive language modeling.

Abstract

The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized latent spaces, wherein transition dynamics are subsequently learned. However, this conventional paradigm is mathematically flawed: it merely displaces the manifold learning problem into the latent space. When the underlying data distribution shifts, the latent manifold shifts accordingly, forcing the predictive operator to implicitly relearn the new topological structure. Furthermore, by classical approximation theory, positive operators like dot product attention inevitably suffer from the saturation phenomenon, permanently bottlenecking their predictive capacity and leaving them vulnerable to the curse of dimensionality. In this paper, we formulate a mathematically rigorous paradigm for world model construction by redefining the core predictive mechanism. Inspired by Ryan O'Dowd's foundational work we introduce Spherical Kernel Operator (SKO), a framework that replaces standard attention. By projecting the unknown data manifold onto a unified ambient hypersphere and utilizing a localized sequence of ultraspherical (Gegenbauer) polynomials, SKO performs direct integral reconstruction of the target function. Because this localized spherical polynomial kernel is not strictly positive, it bypasses the saturation phenomenon, yielding approximation error bounds that depend strictly on the intrinsic manifold dimension q, rather than the ambient dimension. Furthermore, by formalizing its unnormalized output as an authentic measure support estimator, SKO mathematically decouples the true environmental transition dynamics from the biased observation frequency of the agent. Empirical evaluations confirm that SKO significantly accelerates convergence and outperforms standard attention baselines in autoregressive language modeling.