MedArena: Comparing LLMs for Medicine-in-the-Wild Clinician Preferences
arXiv cs.CL / 3/18/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- MedArena is an interactive platform that lets clinicians compare leading LLMs on their own real-world medical queries, addressing shortcomings of static benchmarks.
- Across 1571 clinician preferences from 12 LLMs up to November 1, 2025, Gemini 2.0 Flash Thinking, Gemini 2.5 Pro, and GPT-4o emerged as the top models by Bradley-Terry rating.
- Most clinician prompts involved treatment decisions, clinical documentation, or patient communication rather than factual recall, with ~20% involving multi-turn conversations.
- The study finds model rankings remain stable after adjusting for style factors like response length and formatting, supporting MedArena as a scalable, real-world evaluation approach for medical LLMs.
Related Articles
ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH
Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to
Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to
Perplexity Hub
Dev.to
How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to