AI Navigate

PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents

arXiv cs.CL / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents PersonaTrace, a method that uses LLM agents to synthesize realistic digital footprints from structured user profiles, generating artifacts such as emails, messages, and calendar entries.
  • It addresses data scarcity by creating diverse and plausible synthetic datasets for training and evaluating models.
  • Intrinsic evaluation shows the synthetic data are more diverse and realistic than existing baselines, and models fine-tuned on this data outperform those trained on other synthetic datasets on real-world tasks.
  • The approach enables research and development of personalized applications and behavioral analytics using synthetic data.

Abstract

Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.