LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

LifeSim introduces a user simulator that models user cognition through the Belief-Desire-Intention (BDI) framework within physical environments to generate coherent, long-horizon life trajectories and intention-driven interactions.
It also presents LifeSim-Eval, a comprehensive benchmark spanning 8 life domains and 1,200 scenarios, employing multi-turn interactions to assess models' abilities to satisfy explicit and implicit intentions, recover user profiles, and deliver high-quality responses.
Experiments show current large language models struggle significantly with implicit intention understanding and long-term user preference modeling in both single-scenario and long-horizon settings.
The work aims to better align evaluation with real-world user–assistant interactions, potentially guiding future research and development of personalized AI assistants.

Abstract

The rapid advancement of large language models (LLMs) has accelerated progress toward universal AI assistants. However, existing benchmarks for personalized assistants remain misaligned with real-world user-assistant interactions, failing to capture the complexity of external contexts and users' cognitive states. To bridge this gap, we propose LifeSim, a user simulator that models user cognition through the Belief-Desire-Intention (BDI) model within physical environments for coherent life trajectories generation, and simulates intention-driven user interactive behaviors. Based on LifeSim, we introduce LifeSim-Eval, a comprehensive benchmark for multi-scenario, long-horizon personalized assistance. LifeSim-Eval covers 8 life domains and 1,200 diverse scenarios, and adopts a multi-turn interactive method to assess models' abilities to complete explicit and implicit intentions, recover user profiles, and produce high-quality responses. Under both single-scenario and long-horizon settings, our experiments reveal that current LLMs face significant limitations in handling implicit intention and long-term user preference modeling.

The programming passion is melting

Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Dev.to

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

Reddit r/LocalLLaMA

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

Key Points

Abstract

Related Articles

The programming passion is melting

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations

Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer