Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
arXiv cs.AI / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Market-Bench, a configurable multi-agent benchmark designed to test large language models on economically and trade-relevant tasks such as procurement and retailing.
- In the procurement stage, LLMs participate in budget-constrained auctions to bid for limited inventory, and in the retail stage they set prices and generate marketing slogans for role-based buyer attention.
- Market-Bench records full interaction trajectories—including bids, prices, slogans, sales, and balance-sheet states—so evaluations can combine economic/operational outcomes with semantic scoring.
- Experiments across 20 open- and closed-source LLM agents show substantial performance gaps and a “winner-take-most” dynamic, where only a small fraction consistently achieve capital appreciation while many stay near break-even.
- The authors position Market-Bench as a reproducible testbed for studying how LLMs behave and compete in simulated markets under constrained resources and competition.
Related Articles

Black Hat Asia
AI Business

Meta's latest model is as open as Zuckerberg's private school
The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds
SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)
Dev.to
BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.
Reddit r/artificial