AI Navigate

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

arXiv cs.CL / 3/20/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • Introduces the HiFi-KPI dataset, a large-scale resource for hierarchical KPI extraction from earnings filings, comprising 1.65M paragraphs and 198k hierarchical labels linked to iXBRL taxonomies.
  • Defines three evaluation tasks (KPI classification, KPI extraction, and structured KPI extraction) and releases HiFi-KPI-Lite, a manually curated 8K-paragraph subset.
  • Reports strong baselines: encoder-based models reach over 0.906 macro-F1 on classification, while LLMs achieve about 0.440 F1 on structured extraction, with most errors tied to date handling.
  • Open-sources all code and data at the provided GitHub repository, facilitating reproducibility and further research.
  • Aims to improve cross-company transferability of KPI tagging in financial filings and accelerate rapid evaluation for KPI extraction systems.

Abstract

Accurate tagging of earnings reports can yield significant short-term returns for stakeholders. The machine-readable inline eXtensible Business Reporting Language (iXBRL) is mandated for public financial filings. Yet, its complex, fine-grained taxonomy limits the cross-company transferability of tagged Key Performance Indicators (KPIs). To address this, we introduce the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset, a large-scale corpus of 1.65M paragraphs and 198k unique, hierarchically organized labels linked to iXBRL taxonomies. HiFi-KPI supports multiple tasks and we evaluate three: KPI classification, KPI extraction, and structured KPI extraction. For rapid evaluation, we also release HiFi-KPI-Lite, a manually curated 8K paragraph subset. Baselines on HiFi-KPI-Lite show that encoder-based models achieve over 0.906 macro-F1 on classification, while Large Language Models (LLMs) reach 0.440 F1 on structured extraction. Finally, a qualitative analysis reveals that extraction errors primarily relate to dates. We open-source all code and data at https://github.com/aaunlp/HiFi-KPI.