Personalization Toolkit: Training Free Personalization of Large Vision Language Models

arXiv cs.CV / 4/29/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper tackles personalization of Large Vision-Language Models (LVLMs) by replacing per-item time-consuming training with a training-free method.
It proposes a model-agnostic “Personalization Toolkit” (\ours) that uses pre-trained vision foundation models to extract distinctive visual features.
The approach combines retrieval-augmented generation (RAG) to locate relevant instances in images and videos and visual prompting to steer the LVLM’s outputs.
The authors introduce a more comprehensive real-world benchmark to evaluate personalization beyond object-centric, single-concept tests.
Experiments report state-of-the-art performance, outperforming existing training-based personalization methods.

Abstract

Personalization of Large Vision-Language Models (LVLMs) involves customizing models to recognize specific users or object instances and to generate contextually tailored responses. Existing approaches rely on time-consuming training for each item, making them impractical for real-world deployment, as reflected in current personalization benchmarks limited to object-centric single-concept evaluations. In this paper, we present a novel training-free approach to LVLM personalization called \ours. We introduce a comprehensive, real-world benchmark designed to rigorously evaluate various aspects of the personalization task. \ours leverages pre-trained vision foundation models to extract distinctive features, applies retrieval-augmented generation (RAG) techniques to identify instances within visual inputs, and employs visual prompting strategies to guide model outputs. Our model-agnostic vision toolkit enables efficient and flexible multi-concept personalization across both images and videos, without any additional training. We achieve state-of-the-art results, surpassing existing training-based methods.

Black Hat USA

AI Business

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Personalization Toolkit: Training Free Personalization of Large Vision Language Models

Key Points

Abstract

Related Articles

Black Hat USA

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer