CALRK-Bench: Evaluating Context-Aware Legal Reasoning in Korean Law

arXiv cs.AI / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CALRK-Bench, a new Korean-law benchmark designed to evaluate context-aware legal reasoning rather than simple rule memorization.
  • It tests models on three abilities: identifying the temporal validity of legal norms, determining whether sufficient legal information exists for a case, and explaining why legal judgments shift.
  • The dataset is built from Korean legal precedents and legal consultation records, and is validated by legal experts to ensure evaluation relevance.
  • Experiments indicate that even recent large language models perform poorly on these context-aware tasks, highlighting a gap in current LLM capabilities for legal reasoning.
  • The authors release the code publicly as a “stress test” to support more rigorous evaluation of models’ contextual understanding in legal settings.

Abstract

Legal reasoning requires not only the application of legal rules but also an understanding of the context in which those rules operate. However, existing legal benchmarks primarily evaluate rule application under the assumption of fixed norms, and thus fail to capture situations where legal judgments shift or where multiple norms interact. In this work, we propose CALRK-Bench, a context-aware legal reasoning benchmark based on the legal system in Korean. CALRK-Bench evaluates whether models can identify the temporal validity of legal norms, determine whether sufficient legal information is available for a given case, and understand the reasons behind shifts in legal judgments. The dataset is constructed from legal precedents and legal consultation records, and is validated by legal experts. Experimental results show that even recent large language models consistently exhibit low performance on these three tasks. CALRK-Bench provides a new stress test for evaluating context-aware legal reasoning rather than simple memorization of legal knowledge. Our code is available at https://github.com/jhCOR/CALRKBench.