Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety
arXiv cs.AI / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The study conducted a large controlled experiment (N=62,808) across six frontier models and four deployment configurations to examine how scaffolding affects safety.
- Map-reduce scaffolding degrades measured safety (NNH = 14), while two of three scaffold architectures preserve safety within practically meaningful margins.
- Switching from multiple-choice to open-ended format on identical items shifts safety scores by 5-20 percentage points, larger than any scaffold effect.
- Within-format scaffold comparisons are consistent with practical equivalence under the pre-registered +/-2 percentage-point TOST margin, isolating the evaluation format as the operative variable.
- A generalisability analysis yields G = 0.000, with model safety rankings reversing across benchmarks and no composite safety index achieving reliable non-zero reliability, and the authors release ScaffoldSafety code, data, and prompts.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to