Frayed RoPE and Long Inputs: A Geometric Perspective

arXiv cs.LG / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper provides a geometric analysis of Rotary Positional Embedding (RoPE), showing how attention behavior changes when input length exceeds training length and how key/query point clouds cluster, enabling sink tokens as placeholders to prevent token mixing.
It identifies that longer inputs disrupt the separation of key/query clusters, which undermines sink-token functionality and leads to pathological attention behavior.
The authors propose RoPE-ID (In Distribution), a simple modification that applies RoPE at high frequency to a subset of channels to enable generalization to longer inputs without retraining.
They validate RoPE-ID on 1B and 3B parameter Transformers using LongBench and RULER benchmarks, demonstrating improved handling of extended inputs.

Abstract

Rotary Positional Embedding (RoPE) is a widely adopted technique for encoding position in language models, which, while effective, causes performance breakdown when input length exceeds training length. Prior analyses assert (rightly) that long inputs cause channels to rotate ``out of distribution,'' but it is not clear how extra rotation relates to or causes pathological behavior. Through empirical and theoretical analysis we advance a unified geometric understanding of attention behavior with RoPE. We find that attention induces tight clustering of separated key and query latent point clouds, allowing for creation of sink tokens: placeholders that allow attention heads to avoid token mixing when not required. RoPE applied to longer inputs damages this key/query cluster separation, producing pathological behavior by inhibiting sink token functionality. From this geometric perspective, we propose RoPE-ID (In Distribution), a straightforward modification that allows attention layers to generalize to longer inputs out of the box: apply RoPE with high frequency to a subset of channels. We demonstrate the effectiveness of RoPE-ID for extended inputs using 1B and 3B parameter Transformers on the LongBench and RULER information retrieval benchmarks.

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Dev.to

Using Notion MCP: Building a Personal AI 'OS' to Claim Back Your Morning

Dev.to

The LiteLLM Attack Exposed a Bigger Problem: Your Vibe-Coded App Probably Has the Same Vulnerabilities

Dev.to

Why Your Claude-Assisted Project Falls Apart After Week 3 (And How to Fix It)

Dev.to

LatentQA: Teaching LLMs to Decode Activations Into Natural Language

arXiv cs.CL

Frayed RoPE and Long Inputs: A Geometric Perspective

Key Points

Abstract

Related Articles

How We Built ScholarNet AI: An AI-Powered Study Platform for Students

Using Notion MCP: Building a Personal AI 'OS' to Claim Back Your Morning

The LiteLLM Attack Exposed a Bigger Problem: Your Vibe-Coded App Probably Has the Same Vulnerabilities

Why Your Claude-Assisted Project Falls Apart After Week 3 (And How to Fix It)

LatentQA: Teaching LLMs to Decode Activations Into Natural Language

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer