Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
arXiv cs.CL / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates how option design in RLVR-based MCQs affects model reasoning and vulnerability to reward hacking.
- It shows that mismatches in training vs. testing option counts degrade performance, and strong distractors can enable effective RLVR even with 2-way questions.
- A new framework, Iterative Distractor Curation (IDC), actively constructs high-quality distractors to block elimination shortcuts and promote deeper reasoning.
- Experimental results across benchmarks demonstrate that IDC improves distractor quality and yields significant gains in RLVR training over the original data.




