The GELATO Dataset for Legislative NER
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- GELATO is introduced as a dataset of U.S. House and Senate bills from the 118th Congress, using a novel two-level NER ontology designed for legislative texts.
- The paper fine-tunes transformer models (BERT, RoBERTa) for first-level entity prediction and uses LLMs with optimized prompts for second-level predictions.
- Results show RoBERTa outperforming BERT for first-level predictions and LLMs improving second-level extraction, suggesting a strong model combo for legislative NER.
- The dataset and approach are positioned to enable future research and downstream NLP tasks in government and policy domains.
Related Articles
Data Augmentation Using GANs
Dev.to
Speculative Policy Orchestration: A Latency-Resilient Framework for Cloud-Robotic Manipulation
arXiv cs.RO
Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands
arXiv stat.ML
Preference-Guided Debiasing for No-Reference Enhancement Image Quality Assessment
arXiv cs.CV
Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model
arXiv stat.ML