User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

arXiv cs.CL / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VARS (Vector-Adapted Retrieval Scoring), a pipeline-agnostic framework that builds persistent user preference representations using long-term and short-term vectors to bias retrieval in conversational LLM agents.
  • VARS updates user preference vectors online using only weak scalar feedback signals, avoiding per-user fine-tuning while still enabling personalization across sessions.
  • Experiments on the MultiSessionCollab benchmark for math and code tasks show that user-aware retrieval primarily improves interaction efficiency—such as reduced timeouts and lower user effort—rather than delivering major raw accuracy gains under frozen LLM backbones.
  • The proposed dual-vector design is evaluated as interpretable, with long-term vectors reflecting cross-user preference overlap and short-term vectors adapting to session-specific behavior.
  • The authors provide code, model, and data via the linked GitHub repository, supporting reproducibility and further development of user preference-aware retrieval methods.

Abstract

Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.