AI Navigate

Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Legal-DC, a Chinese legal RAG benchmark with 480 legal documents and 2,475 refined QA pairs annotated with clause-level references to enable specialized evaluation for Chinese legal retrieval and generation.
  • It presents the LegRAG framework, combining clause-boundary segmentation with a dual-path self-reflection mechanism to preserve clause integrity while improving answer accuracy.
  • The work also proposes automated evaluation methods tailored for high-reliability legal retrieval scenarios in large language models.
  • LegRAG achieves improvements over existing state-of-the-art methods by 1.3% to 5.6% across key metrics, and the authors release code and data on GitHub for community use.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a promising technology for legal document consultation, yet its application in Chinese legal scenarios faces two key limitations: existing benchmarks lack specialized support for joint retriever-generator evaluation, and mainstream RAG systems often fail to accommodate the structured nature of legal provisions. To address these gaps, this study advances two core contributions: First, we constructed the Legal-DC benchmark dataset, comprising 480 legal documents (covering areas such as market regulation and contract management) and 2,475 refined question-answer pairs, each annotated with clause-level references, filling the gap for specialized evaluation resources in Chinese legal RAG. Second, we propose the LegRAG framework, which integrates legal adaptive indexing (clause-boundary segmentation) with a dual-path self-reflection mechanism to ensure clause integrity while enhancing answer accuracy. Third, we introduce automated evaluation methods for large language models to meet the high-reliability demands of legal retrieval scenarios. LegRAG outperforms existing state-of-the-art methods by 1.3% to 5.6% across key evaluation metrics. This research provides a specialized benchmark, practical framework, and empirical insights to advance the development of Chinese legal RAG systems. Our code and data are available at https://github.com/legal-dc/Legal-DC.