Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety
arXiv cs.AI / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The study conducted a large controlled experiment (N=62,808) across six frontier models and four deployment configurations to examine how scaffolding affects safety.
- Map-reduce scaffolding degrades measured safety (NNH = 14), while two of three scaffold architectures preserve safety within practically meaningful margins.
- Switching from multiple-choice to open-ended format on identical items shifts safety scores by 5-20 percentage points, larger than any scaffold effect.
- Within-format scaffold comparisons are consistent with practical equivalence under the pre-registered +/-2 percentage-point TOST margin, isolating the evaluation format as the operative variable.
- A generalisability analysis yields G = 0.000, with model safety rankings reversing across benchmarks and no composite safety index achieving reliable non-zero reliability, and the authors release ScaffoldSafety code, data, and prompts.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
How AI-Powered Revenue Intelligence Transforms B2B Sales Teams
Dev.to