HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
arXiv cs.CL / 3/20/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- Introduces the HiFi-KPI dataset, a large-scale resource for hierarchical KPI extraction from earnings filings, comprising 1.65M paragraphs and 198k hierarchical labels linked to iXBRL taxonomies.
- Defines three evaluation tasks (KPI classification, KPI extraction, and structured KPI extraction) and releases HiFi-KPI-Lite, a manually curated 8K-paragraph subset.
- Reports strong baselines: encoder-based models reach over 0.906 macro-F1 on classification, while LLMs achieve about 0.440 F1 on structured extraction, with most errors tied to date handling.
- Open-sources all code and data at the provided GitHub repository, facilitating reproducibility and further research.
- Aims to improve cross-company transferability of KPI tagging in financial filings and accelerate rapid evaluation for KPI extraction systems.
Related Articles
Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to