Response-Aware User Memory Selection for LLM Personalization

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses LLM personalization by selecting which pieces of user memory to include in the prompt at inference time, improving over methods that rely mainly on semantic similarity to the query.
  • It proposes Response-Utility optimization for Memory Selection (RUMS), which selects memory subsets by maximizing mutual information between memory and the model’s outputs to reduce response uncertainty.
  • The authors argue that this information-theoretic objective yields memory selections that better match human preferences than state-of-the-art selection methods.
  • Experiments report that RUMS can achieve improved response quality, while also reducing computational cost by up to 95%, despite the comparison highlighting much larger (up to 400×) models in prior results.
  • Overall, the work provides a more principled framework for user-memory selection that directly targets how memory affects the model’s response distribution rather than only relevance.

Abstract

A common approach to personalization in large language models (LLMs) is to incorporate a subset of the user memory into the prompt at inference time to guide the model's generation. Existing methods select these subsets primarily using similarity between user memory items and input queries, ignoring how features actually affect the model's response distribution. We propose Response-Utility optimization for Memory Selection (RUMS), a novel method that selects user memory items by measuring the mutual information between a subset of memory and the model's outputs, identifying items that reduce response uncertainty and sharpen predictions beyond semantic similarity. We demonstrate that this information-theoretic foundation enables more principled user memory selection that aligns more closely with human selection compared to state-of-the-art methods, and models 400\times larger. Additionally, we show that memory items selected using RUMS result in better response quality compared to existing approaches, while having up to 95\% reduction in computational cost.