Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries
arXiv cs.CL / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper finds that large language models exhibit a form of categorical perception in their hidden states when processing Arabic numerals, showing geometric “structural warping” at digit-count boundaries (specifically at transitions like 10 and 100).
- Across six models from five different architecture families, a CP-additive representational geometry model fits better than a purely continuous model at 100% of primary layers tested, indicating the effect is robust within LLM internal representations.
- The boundary-specific warping is absent at non-boundary control positions and is also absent in the temperature domain, where there is no tokenization discontinuity for linguistic categories such as “hot/cold.”
- Two distinct signatures are reported: “classic CP” models both internalize the category distinction and show warping, while “structural CP” models show the warping at the boundary even though they cannot explicitly report the category distinction.
- The authors conclude that structural input-format discontinuities alone can induce categorical-perception-like geometry in LLMs, independent of explicit semantic category knowledge and stable across boundaries and architectural families.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to