Residual Stream Analysis of Overfitting And Structural Disruptions
arXiv cs.LG / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The work shows that safety-focused fine-tuning with standard refusal templates yields a higher false-refusal rate on benign prompts, rising from 63% to 84% as safety data goes from 0% to 40%.
- It finds that safety data exhibits substantially lower token entropy and 2-gram diversity (0.048) compared with general instruction data.
- It introduces FlowLens, a stable PCA-based tool for residual-stream geometry analysis to reveal that safety data concentrates variance along a few components, reducing representational smoothness.
- It proposes Variance Concentration Loss (VCL), a regularizer that penalizes excessive variance concentration in mid-layer residuals, reducing false refusals by over 35 percentage points while maintaining or improving performance on benchmarks like MMLU and GSM8K.
Related Articles
Automating the Chase: AI for Festival Vendor Compliance
Dev.to
MCP Skills vs MCP Tools: The Right Way to Configure Your Server
Dev.to
500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)
Dev.to
Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?
Dev.to

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER