PersonaVLM: Long-Term Personalized Multimodal LLMs

arXiv cs.CL / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

PersonaVLM is introduced as a framework for turning general-purpose multimodal LLMs into long-term personalized assistants that adapt to a user’s evolving preferences over time.
The approach combines three capabilities: proactive multimodal memory extraction and summarization (Remembering), retrieval-based multi-turn integration for reasoning, and ongoing personality inference for response alignment.
The paper claims substantial performance gains, reporting a 22.4% improvement on Persona-MME (and 9.8% on PERSONAMEM) under a 128k context, plus results that outperform GPT-4o on the proposed evaluations.
To measure long-horizon personalization, the authors also release Persona-MME, a benchmark with 2,000+ curated interaction cases covering seven aspects and 14 fine-grained tasks.
Overall, PersonaVLM targets a gap in prior personalization methods that largely support only static or single-turn user alignment.

Abstract

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database. (c) Response Alignment: It infers the user's evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method's effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.