Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
arXiv cs.CL / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- OmniBehavior is introduced as a user-simulation benchmark built entirely from real-world data, designed to support long-horizon, cross-scenario, and heterogeneous human behavior traces in a unified framework.
- The authors argue and provide empirical evidence that prior benchmarks using isolated scenarios can create “tunnel vision,” while authentic decision-making depends on long-term, cross-scenario causal chains.
- Evaluations on state-of-the-art LLMs show that these models struggle to simulate complex real-world behavior, with performance plateauing even when context window sizes increase.
- A comparison between simulated and authentic behaviors identifies structural biases in LLM simulations, including convergence toward a “positive average person,” hyper-activity, persona homogenization, and a Utopian bias that erodes individual differences and long-tail behaviors.
- The paper highlights key research directions for improving high-fidelity human behavior simulation beyond current LLM capabilities and benchmark designs.



