Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)

arXiv cs.CV / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes XShapeEnc, a training-free, general-purpose positional encoding method for spatially grounded 2D geometric shapes that aims to extend beyond standard 1D sequence encodings.
  • XShapeEnc decomposes each shape into normalized geometry (in the unit disk) and a pose vector that is transformed into a harmonic pose field, both encoded with orthogonal Zernike bases.
  • It includes a frequency-propagation step intended to enrich the representation with high-frequency content for better neural discriminability.
  • The authors claim five key properties for the resulting compact encoding, including invertibility, adaptivity, and frequency richness, and they provide theoretical validation plus efficiency/discriminability analysis.
  • Experiments across multiple shape-aware tasks, supported by a self-curated XShapeCorpus, are used to demonstrate applicability and to position XShapeEnc as a foundational tool for “2D spatial intelligence” research.

Abstract

Positional encoding has become the de facto standard for grounding deep neural networks on discrete point-wise positions, and it has achieved remarkable success in tasks where the input can be represented as a one-dimensional sequence. However, extending this concept to 2D spatial geometric shapes demands carefully designed encoding strategies that account not only for shape geometry and pose, but also for compatibility with neural network learning. In this work, we address these challenges by introducing a training-free, general-purpose encoding strategy, dubbed XShapeEnc, that encodes an arbitrary spatially grounded 2D geometric shape into a compact representation exhibiting five favorable properties, including invertibility, adaptivity, and frequency richness. Specifically, a 2D spatially grounded geometric shape is decomposed into its normalized geometry within the unit disk and its pose vector, where the pose is further transformed into a harmonic pose field that also lies within the unit disk. A set of orthogonal Zernike bases is constructed to encode shape geometry and pose either independently or jointly, followed by a frequency-propagation operation to introduce high-frequency content into the encoding. We demonstrate the theoretical validity, efficiency, discriminability, and applicability of XShapeEnc via extensive analysis and experiments across a wide range of shape-aware tasks and our self-curated XShapeCorpus. We envision XShapeEnc as a foundational tool for research that goes beyond one-dimensional sequential data toward frontier 2D spatial intelligence.