InterPol: De-anonymizing LM Arena via Interpolated Preference Learning
arXiv cs.AI / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- INTERPOL is a model-driven identification framework designed to de-anonymize LM Arena responses by distinguishing target models using interpolated preference data.
- It synthesizes hard negative samples through model interpolation and employs an adaptive curriculum learning strategy to uncover deep stylistic patterns that simple statistical features miss.
- Experimental results show INTERPOL outperforms existing baselines in model identification accuracy, highlighting a vulnerability in anonymous leaderboards.
- The authors simulate ranking manipulation on Arena battle data to quantify real-world threat and assess implications for fairness and reliability of LM evaluation platforms.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to