Rethinking positional encoding as a geometric constraint rather than a signal injection

Reddit r/LocalLLaMA / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The article proposes reframing positional encoding from additive “signal injection” into a geometric constraint that restricts where token embeddings may lie on a manifold.
  • It argues that standard additive positional encodings can disrupt the semantic geometry of embeddings, potentially harming neighborhood structure and token meaning separation.
  • The approach aims to cleanly separate “what a token means” from “where it sits,” improving the conceptual decomposition of semantics vs. position.
  • Preliminary results suggest more stable attention patterns on longer sequences and reduced need for explicit length-generalization tricks.
  • The author notes early promise for out-of-distribution length handling and less “attention sink” behavior, but emphasizes ongoing stress-testing and uncertainty about whether it is fully principled vs. a regularization technique.

We've been exploring an alternative framing of positional encoding where instead of additively injecting position signals into token embeddings, you treat position as a geometric constraint on the manifold the embeddings are allowed to occupy.

The core idea:

  • Standard additive PE shifts embeddings in ways that can interfere with semantic geometry
  • Treating position as a manifold constraint instead preserves the semantic neighborhood structure
  • This gives a cleaner separation between "what this token means" and "where this token sits"
  • Preliminary results show more stable attention patterns on longer sequences without explicit length generalization tricks

The practical upshot seems to be better out-of-distribution length handling and less attention sink behavior, though we're still stress-testing the latter.

Whether this reads as a principled geometric reframing or just another way to regularize positional influence, genuinely not sure yet. Curious if this decomposition feels natural to people working on interpretability or long-context architectures.

arXiv link once we clean up the writeup.

submitted by /u/bobupuhocalusof
[link] [comments]