Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

arXiv cs.AI / 4/30/2026

📰 NewsModels & Research

共有:

Key Points

The paper addresses how to generate high-quality user personas from noisy, interleaved behavioral logs, building on prior work that uses LLMs but often lacks strong assurance of persona quality.
It introduces a hierarchical framework that aggregates user actions into intent “memories,” then induces multiple personas by clustering and labeling these memories.
Persona quality is optimized using an objective that balances cluster cohesion, alignment between personas and evidence, and “truthfulness” of the personas.
The authors train the persona model with a groupwise extension of Direct Preference Optimization (DPO) to improve the resulting personas.
Experiments on a large service-log dataset and two public datasets show the approach produces more coherent, evidence-grounded, and trustworthy personas and also improves future interaction prediction.

Abstract

Behavioral logs provide rich signals for user modeling, but are noisy and interleaved across diverse intents. Recent work uses LLMs to generate interpretable natural-language personas from user logs, yet evaluation often emphasizes downstream utility, providing limited assurance of persona quality itself. We propose a hierarchical framework that aggregates user actions into intent memories and induces multiple evidence-grounded personas by clustering and labeling these memories. We formulate persona induction as an optimization problem over persona quality-captured by cluster cohesion, persona-evidence alignment, and persona truthfulness-and train the persona model using a groupwise extension of Direct Preference Optimization (DPO). Experiments on a large-scale service log and two public datasets show that our method induces more coherent, evidence-grounded, and trustworthy personas, while also improving future interaction prediction.

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

Reddit r/MachineLearning

RAG Series (1): Why LLMs Need External Memory

Dev.to

One Open Source Project a Day (No. 54): Warp - The AI-Native Rust Terminal

Dev.to

One Open Source Project a Day (No. 53): pi-mono - Minimalist & High-Performance AI Coding Agent

Dev.to

Best Open Source Subtitle Generator? Canary Qwen 2.5B + Whisper Full Guide

Dev.to

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

Key Points

Abstract

Related Articles

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

RAG Series (1): Why LLMs Need External Memory

One Open Source Project a Day (No. 54): Warp - The AI-Native Rust Terminal

One Open Source Project a Day (No. 53): pi-mono - Minimalist & High-Performance AI Coding Agent

Best Open Source Subtitle Generator? Canary Qwen 2.5B + Whisper Full Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer