The GELATO Dataset for Legislative NER
arXiv cs.CL / 3/17/2026
📰 NewsModels & Research
Key Points
- GELATO is introduced as a dataset of U.S. House and Senate bills from the 118th Congress, using a novel two-level NER ontology designed for legislative texts.
- The paper fine-tunes transformer models (BERT, RoBERTa) for first-level entity prediction and uses LLMs with optimized prompts for second-level predictions.
- Results show RoBERTa outperforming BERT for first-level predictions and LLMs improving second-level extraction, suggesting a strong model combo for legislative NER.
- The dataset and approach are positioned to enable future research and downstream NLP tasks in government and policy domains.
Related Articles

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」
日経XTECH

Superposition and the Capsule: Quantum State Collapse Meets AI Identity
Dev.to

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely
Dev.to

The Loop as Laboratory: What 3,190 Cycles of Autonomous AI Operation Reveal
Dev.to

MiMo-V2-Pro & Omni & TTS: "We will open-source — when the models are stable enough to deserve it."
Reddit r/LocalLLaMA