AI Achieves a Perfect LSAT Score
arXiv cs.AI / 2026/4/14
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper (arXiv:2604.10034v1) claims the first documented case of a language model scoring a perfect 180 on an officially disclosed LSAT in controlled experiments.
- It finds that prompt changes, shuffling answer choices, and sampling multiple responses do not materially affect performance, suggesting the model’s results are robust to common evaluation-time perturbations.
- Removing the model’s generated “thinking” phase reduces frontier accuracy by up to 8 percentage points, mostly impacting logical reasoning.
- Distillation that reproduces full thinking traces still underperforms frontier systems, implying that trace format alone is insufficient for top performance.
- A pilot reward-model approach fine-tuned with QLoRA on official LSAT explanations using best-of-5 selection narrows the gap, again with gains concentrated in logical reasoning.



