Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
arXiv cs.CL / 4/9/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces SalesLLM, a bilingual (ZH/EN) benchmark for evaluating LLMs on realistic, multi-turn sales dialogues with measurable deal progression and end outcomes.
- SalesLLM is built from 30,074 scripted configurations and 1,805 curated scenarios with controllable difficulty, personas, and coverage across Financial Services and Consumer Goods.
- The evaluation pipeline is fully automatic, using an LLM-based rater for sales-process progress and fine-tuned BERT classifiers to predict buying intent at the end of dialogues.
- To improve simulation fidelity, the authors train a customer behavior model (CustomerLM) with SFT and DPO, reducing role inversion from 17.44% (GPT-4o) to 8.8%.
- Results show strong correlation with expert human ratings (Pearson r=0.98) and significant performance variation across 15 mainstream LLMs, indicating the benchmark can help develop outcome-oriented sales agents.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to