Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
arXiv cs.CL / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that user simulation can be a practical alternative to collecting and scoring real chatbot interactions, but it must be realistic to reflect real user–bot dialogue patterns.
- It introduces realsim, a new evaluation framework that compares real versus simulated multi-turn dialogues across eight dimensions spanning communicative function, user state, and the surface form of user messages.
- The framework is instantiated using a curated dataset of 1,000 real, task-focused multi-turn user–chatbot dialogues across 16 application domains.
- The authors find that simulated users often fail to reproduce communication “frictions” that real users create, potentially making simulation-based evaluations too optimistic.
- The results also vary by domain, suggesting that domain-specific user simulators may be necessary rather than relying on a single general-purpose simulator.
Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision
Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy
Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)
Dev.to

MCP annotations are a UX layer, not a security layer
Dev.to
From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM
Dev.to