Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings
arXiv cs.CL / 5/1/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes mean pooling in text embedding models and argues it can cause “second-order collapse,” where information in higher-order (spatial/structural) statistics of token embeddings is lost.
- It introduces a simple metric to quantify how much collapse mean pooling induces, and applies it to real models and datasets.
- Empirical results show modern text encoders are generally robust to this second-order collapse, with contrastively fine-tuned encoders less prone to it than their pretrained backbones.
- The study attributes the robustness to how tightly token embeddings concentrate within each text, and finds that lower measured collapse correlates with better downstream task performance.
- Overall, the findings provide a new explanation for why effective text embeddings can still be produced using relatively coarse mean pooling.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Why Enterprise AI Pilots Fail
Dev.to

The PDF Feature Nobody Asked For (That I Use Every Day)
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER