MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

arXiv cs.CL / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MiNER, a two-stage pipeline to extract key metadata (meeting number, date, location, participants, and time ranges) from heterogeneous municipal meeting minutes where information is often unstandardized.
  • In stage one, a transformer-based question answering model locates the opening and closing text spans containing metadata, followed by an entity extraction stage using BERTimbau and XLM-RoBERTa variants with optional CRF layers.
  • The entity extraction is enhanced with deslexicalization to improve fine-grained recognition in the municipal minutes domain.
  • The authors benchmark both open-weight (Phi) and closed-weight (Gemini) LLMs, comparing predictive performance alongside inference cost and carbon footprint.
  • Results show strong in-domain accuracy but weaker cross-municipality generalization due to linguistic complexity and document variability, and the work also establishes the first benchmark for this metadata-extraction task.

Abstract

Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easy to extract automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from municipal minutes. First, a question answering (QA) model identifies the opening and closing text segments containing metadata. Transformer-based models (BERTimbau and XLM-RoBERTa with and without a CRF layer) are then applied for fine-grained entity extraction and enhanced through deslexicalization. To evaluate our proposed pipeline, we benchmark both open-weight (Phi) and closed-weight (Gemini) LLMs, assessing predictive performance, inference cost, and carbon footprint. Our results demonstrate strong in-domain performance, better than larger general-purpose LLMs. However, cross-municipality evaluation reveals reduced generalization reflecting the variability and linguistic complexity of municipal records. This work establishes the first benchmark for metadata extraction from municipal meeting minutes, providing a solid foundation for future research in this domain.

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes | AI Navigate