Linear Probe Accuracy Scales with Model Size and Benefits from Multi-Layer Ensembling
arXiv cs.LG / 4/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Linear probes can serve as detectors for language-model outputs the model “knows” are wrong, but prior work shows single-layer probing is brittle and fails on certain deception types.
- The study introduces multi-layer ensembling of linear probes, which restores strong detection performance even when individual probes fail, yielding AUROC gains of +29% on Insider Trading and +78% on Harm-Pressure Knowledge.
- Experiments across 12 model sizes (0.5B–176B parameters) show probe accuracy systematically improves with model scale at roughly ~5% AUROC per 10× parameters (R=0.81).
- The authors argue the key mechanism is geometric: “deception directions” rotate gradually across layers rather than being localized to a single layer, explaining both fragility of single-layer probes and robustness of ensembles.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to