The Cylindrical Representation Hypothesis for Language Model Steering
arXiv cs.CL / 5/5/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper argues that existing theory for language-model steering, based on the Linear Representation Hypothesis (LRH), is insufficient because real concept representations are not perfectly orthogonal, leading to unstable and unpredictable steering effects.
- It introduces the Cylindrical Representation Hypothesis (CRH), proposing that concept absence/presence is captured by a central axis, while steering sensitivity is governed by a surrounding normal plane.
- CRH suggests that only particular “sensitive sectors” within the normal plane strongly enable the target concept, while other sectors can suppress or delay activation.
- The authors formalize how uncertainty at the sector level arises intrinsically (even when steering directions are well aligned), offering an explanation for why steering outcomes can fluctuate.
- Experiments reportedly confirm the cylindrical structure and show that CRH can be used to interpret and analyze steering behavior in practical settings, with code provided at the linked GitHub repo.
Related Articles

Black Hat USA
AI Business

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to