AI Navigate

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • LifeSim introduces a user simulator that models user cognition through the Belief-Desire-Intention (BDI) framework within physical environments to generate coherent, long-horizon life trajectories and intention-driven interactions.
  • It also presents LifeSim-Eval, a comprehensive benchmark spanning 8 life domains and 1,200 scenarios, employing multi-turn interactions to assess models' abilities to satisfy explicit and implicit intentions, recover user profiles, and deliver high-quality responses.
  • Experiments show current large language models struggle significantly with implicit intention understanding and long-term user preference modeling in both single-scenario and long-horizon settings.
  • The work aims to better align evaluation with real-world user–assistant interactions, potentially guiding future research and development of personalized AI assistants.

Abstract

The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states. To bridge this gap, we propose LifeSim, a user simulator that models user cognition through the Belief-Desire-Intention (BDI) model within physical environments for coherent life trajectories generation, and simulates intention-driven user interactive behaviors. Based on LifeSim, we introduce LifeSim-Eval, a comprehensive benchmark for multi-scenario, long-horizon personalized assistance. LifeSim-Eval covers 8 life domains and 1,200 diverse scenarios, and adopts a multi-turn interactive method to assess models' abilities to complete explicit and implicit intentions, recover user profiles, and produce high-quality responses. Under both single-scenario and long-horizon settings, our experiments reveal that current LLMs face significant limitations in handling implicit intention and long-term user preference modeling.