AI Navigate

An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

arXiv cs.CL / 3/12/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The authors release a large bilingual English/German corpus of catalog records annotated with the Integrated Authority File (GND) and a machine-actionable GND taxonomy to enable ontology-aware multi-label classification.
  • The dataset supports mapping text to authority terms and agent-assisted cataloging with reproducible, authority-grounded evaluation.
  • They provide a statistical profile and qualitative error analyses of three systems and invite the community to evaluate not just accuracy but usefulness and transparency toward authority-anchored AI co-pilots that amplify catalogers' work.
  • The resource enables cross-language discovery and has potential to transform digital-library workflows by integrating AI-assisted, authority-grounded curation.

Abstract

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.