MarketBench: Evaluating AI Agents as Market Participants
arXiv cs.AI / 4/28/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes MarketBench, a benchmark to evaluate whether AI agents can generate accurate signals about their task success probability and the costs (e.g., token usage) of completing tasks in market-like coordination settings.
- Using a 93-task subset of SWE-bench Lite and testing six recently released LLMs, the authors show that the models are miscalibrated on both success likelihood and token consumption.
- When agents report their own estimates to participate in auctions, the resulting allocations diverge from those expected under full-information assumptions.
- Adding additional capability information from prior experiments into the agents’ context improves calibration only modestly, indicating persistent self-assessment limitations.
- The study also reports how market-based scaffolding performs with these LLMs, and concludes that self-assessment is a key bottleneck for reliable market-style coordination.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them
Dev.to
AI 编程工具对比 2026:Claude Code vs Cursor vs Gemini CLI vs Codex
Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools
Dev.to

An improvement of the convergence proof of the ADAM-Optimizer
Dev.to