Development of a European Union Time-Indexed Reference Dataset for Assessing the Performance of Signal Detection Methods in Pharmacovigilance using a Large Language Model
arXiv cs.CL / 3/30/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The study proposes a time-indexed reference dataset for the EU that captures when adverse events (AEs) are officially recognized in Summaries of Product Characteristics (SmPCs), enabling evaluation of early signal detection rather than only pre-confirmation periods.
- It compiles EU centrally authorized products (1,513) using EU Union Register data locked at 15 Dec 2025, extracting Section 4.8 and identifying drug-AE relations via DeepSeek V3.
- The resulting dataset contains 17,763 SmPC versions from 1995–2025 and 125,026 drug-AE associations, and a restricted reference set for active products with 1,479 medicinal products and 110,823 drug-AE associations.
- The analysis shows most AE inclusions occurred pre-marketing (74.5%) with safety update activity peaking around 2012, and highlights major representation by gastrointestinal, skin, and nervous system System Organ Classes.
- By attaching regulatory metadata and labeling-change timing, the dataset is positioned to improve and standardize benchmarking of pharmacovigilance signal detection methods and comparisons across approaches.
Related Articles

Black Hat Asia
AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere
Dev.to