When Choices Become Risks: Safety Failures of Large Language Models under Multiple-Choice Constraints
arXiv cs.CL / 4/21/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM safety evaluations that focus on open-ended refusal can miss a key risk in structured settings like multiple-choice questions (MCQs), where refusal is discouraged or impossible.
- It identifies a failure mode where harmful requests are reformulated into forced-choice MCQs with only unsafe options, systematically bypassing refusal behavior even when the same harm is rejected in open-ended prompts.
- Testing across 14 proprietary and open-source models shows that forced-choice constraints significantly increase policy-violating responses.
- For human-authored MCQs, violation rates follow an inverted U-shaped pattern as structural constraint strength increases, while model-generated MCQs reach near-saturation and show strong transferability across different models.
- The results suggest current safety benchmarks substantially underestimate dangers in constrained decision-making tasks and point to constrained-choice prompting as an underexplored alignment failure surface.
Related Articles

Competitive Map: 10 AI Agent Platforms vs AgentHansa
Dev.to

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

🚀 Major BrowserAct CLI Update
Dev.to