MedArena: Comparing LLMs for Medicine-in-the-Wild Clinician Preferences
arXiv cs.CL / 3/18/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- MedArena is an interactive platform that lets clinicians compare leading LLMs on their own real-world medical queries, addressing shortcomings of static benchmarks.
- Across 1571 clinician preferences from 12 LLMs up to November 1, 2025, Gemini 2.0 Flash Thinking, Gemini 2.5 Pro, and GPT-4o emerged as the top models by Bradley-Terry rating.
- Most clinician prompts involved treatment decisions, clinical documentation, or patient communication rather than factual recall, with ~20% involving multi-turn conversations.
- The study finds model rankings remain stable after adjusting for style factors like response length and formatting, supporting MedArena as a scalable, real-world evaluation approach for medical LLMs.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial
Why I Switched From GPT-4 to Small Language Models for Two of My Products
Dev.to