TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TSUBASA is proposed as a two-part method to improve personalized LLM performance on long-horizon tasks by evolving how user information is written to memory and how it is read back.
The approach addresses key weaknesses in prior memory mechanisms, including difficulty tracking evolving user behavior over long conversation/activity histories.
TSUBASA also targets the RAG quality–efficiency tradeoff and the train–inference gap in parametric adaptation by using a self-learning objective with context distillation to internalize user experiences.
Experiments on long-horizon benchmarks with the Qwen-3 model family (4B–32B) show TSUBASA outperforms memory-augmented competitors like Mem0 and Memory-R1, which rely more heavily on memory writing.
The authors report Pareto improvements that deliver robust, high-fidelity personalization while reducing token budget compared with prior approaches.

Abstract

Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.