TailNLG: A Multilingual Benchmark Addressing Verbalization of Long-Tail Entities
arXiv cs.CL / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that multilingual data-to-text verbalization of knowledge graphs can be biased against rare (long-tail) entities, limiting usability for non-expert users and retrieval-augmented generation systems.
- It introduces TailNLG, a new multilingual benchmark (English, Italian, Spanish) built from Wikidata that systematically varies entity popularity and is designed to study long-tail effects.
- The study evaluates three families of large language models in zero-shot settings and finds a consistent bias against long-tail entities, with lower embedding-based scores and higher model uncertainty for rare items.
- It shows that the magnitude of long-tail bias differs by model and language, and that existing evaluation metrics may not reliably reflect these differences, motivating improved evaluation approaches.



