RenoBench: A Citation Parsing Benchmark

arXiv cs.CL / 3/27/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • RenoBench is introduced as a public-domain benchmark for citation parsing, designed to address limitations of prior evaluations (lack of generalizability, reliance on synthetic data, or limited availability).
  • The dataset is built from 161,000 annotated citations extracted from PDFs across four publishing ecosystems (SciELO, Redalyc, Public Knowledge Project, and Open Research Europe), producing 10,000 citations with multilingual, multi–publication-type coverage.
  • The authors apply automated validation and feature-based sampling to improve dataset quality and representativeness across languages, platforms, and citation formats.
  • Experiments evaluate multiple citation parsing systems and report field-level precision/recall, finding that language models perform strongly, especially when fine-tuned.
  • RenoBench aims to enable reproducible and standardized evaluation for citation parsing and to support downstream automated citation infrastructure and metascientific research.

Abstract

Accurate parsing of citations is necessary for machine-readable scholarly infrastructure. But, despite sustained interest in this problem, existing evaluation techniques are often not generalizable, based on synthetic data, or not publicly available. We introduce RenoBench, a public domain benchmark for citation parsing, sourced from PDFs released on four publishing ecosystems: SciELO, Redalyc, the Public Knowledge Project, and Open Research Europe. Starting from 161,000 annotated citations, we apply automated validation and feature-based sampling to produce a dataset of 10,000 citations spanning multiple languages, publication types, and platforms. We then evaluate a variety of citation parsing systems and report field-level precision and recall. Our results show strong performance from language models, particularly when fine-tuned. RenoBench enables reproducible, standardized evaluation of citation parsing systems, and provides a foundation for advancing automated citation parsing and metascientific research.
広告