Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations
arXiv cs.CL / 5/1/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces CarryOnBench, an interactive multi-turn benchmark designed to test whether LLMs can recover helpful utility after initially misinterpreting benign user intent while still maintaining safety.
- Using 398 initially “seemingly harmful” queries and simulating 5,970 conversations across 14 models, the study evaluates both intent-aligned utility and safety over 4–12 turn flows totaling 23,880 model responses.
- The proposed Ben-Util metric shows that at the first turn, models satisfy only 10.5%–37.6% of the user’s benign information need, but reach 25.1%–72.1% when the benign intent is provided upfront, indicating errors stem from intent misinterpretation rather than limited knowledge.
- In multi-turn settings with clarifications, 13 of 14 models generally approach the single-turn baseline, but recovery varies by model and exposes three failure modes not seen in single-turn tests: utility lock-in, unsafe recovery, and repetitive recovery.
- The authors find that multi-turn conversations converge to similar harmfulness levels regardless of how conservative the model begins, highlighting a missing dimension in single-turn safety/robustness evaluations: responsiveness to clarified intent.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to

How to Fix OpenClaw Tool Calling Issues
Dev.to