Beyond Case Law: Evaluating Structure-Aware Retrieval and Safety in Statute-Centric Legal QA
arXiv cs.AI / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing Legal QA benchmarks largely target case law, missing key difficulties of statute-centric regulatory reasoning where evidence is scattered across hierarchical documents.
- It introduces SearchFireSafety, a new benchmark designed to test both structure-aware retrieval (graph/hierarchy guided) and safety behaviors like citation-aware abstention when context is insufficient.
- The benchmark uses a dual evaluation approach with real-world citation-requiring questions and synthetic partial-context cases to specifically measure hallucination and refusal.
- Experiments on multiple large language models indicate that graph-guided retrieval improves performance, but also exposes a safety trade-off: domain-adapted models may hallucinate more when crucial statutory evidence is missing.
- The work concludes that future benchmarks should jointly assess hierarchical retrieval quality and model safety for statute-centric legal QA scenarios.
Related Articles
CIA is trusting AI to help analyze intel from human spies
Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table
Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.
Dev.to
Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios
Dev.to

How To Optimize Enterprise AI Energy Consumption
Dev.to