OpenSanctions Pairs: Large-Scale Entity Matching with LLMs
arXiv cs.AI / 3/13/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The OpenSanctions Pairs dataset covers 755,540 labeled pairs from 293 sources across 31 countries, featuring multilingual and cross-script names, noisy attributes, and set-valued fields typical of compliance workflows.
- In benchmarking, a production rule-based matcher (Nomenklatura RegressionV1) is outperformed by LLMs in zero- and few-shot settings, with up to 98.95% F1 from GPT-4o (and 98.23% F1 from a locally deployable open model, DeepSeek-R1-Distill-Qwen-14B).
- DSPy MIPROv2 prompt optimization yields consistent but modest gains; adding in-context examples provides little extra benefit and can degrade performance.
- Error analysis shows rule-based systems over-match (false positives) while LLMs struggle with cross-script transliteration and minor identifier/date inconsistencies, suggesting a shift toward blocking, clustering, and uncertainty-aware review.
- The work indicates pairwise matching performance is nearing a practical ceiling, and code for the project is available on GitHub.




