Frayed RoPE and Long Inputs: A Geometric Perspective
arXiv cs.LG / 3/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper provides a geometric analysis of Rotary Positional Embedding (RoPE), showing how attention behavior changes when input length exceeds training length and how key/query point clouds cluster, enabling sink tokens as placeholders to prevent token mixing.
- It identifies that longer inputs disrupt the separation of key/query clusters, which undermines sink-token functionality and leads to pathological attention behavior.
- The authors propose RoPE-ID (In Distribution), a simple modification that applies RoPE at high frequency to a subset of channels to enable generalization to longer inputs without retraining.
- They validate RoPE-ID on 1B and 3B parameter Transformers using LongBench and RULER benchmarks, demonstrating improved handling of extended inputs.
Related Articles
ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**
Qiita
Complete Guide: How To Make Money With Ai
Dev.to
Built a small free iOS app to reduce LLM answer uncertainty with multiple models
Dev.to
Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again
Dev.to
How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses
Dev.to