Your Reviews Replicate You: LLM-Based Agents as Customer Digital Twins for Conjoint Analysis

arXiv cs.AI / 4/28/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study tackles the time, cost, and respondent-fatigue limitations of traditional conjoint analysis by using LLM-based “customer digital twins” (CDTs) as virtual respondents.
  • It builds individualized agent profiles by identifying active Reddit users, aggregating their review histories into per-user vector databases, and combining RAG with prompt engineering for dynamic retrieval and reasoning.
  • The CDTs conduct pairwise comparisons on product profiles generated via fractional factorial design, and the resulting choices are analyzed with logistic regression to estimate part-worth utilities.
  • Experimental results show CDTs can predict real users’ preferences with 87.73% accuracy, and a monitor-category case study recovers realistic attribute trade-offs (e.g., panel type vs. resolution).
  • Overall, the work proposes a scalable, more agile and cost-efficient alternative to conventional conjoint methods for marketing research.

Abstract

Conjoint analysis is a cornerstone of market research for estimating consumer preferences; however, traditional methods face persistent challenges regarding time, cost, and respondent fatigue. To address these limitations, this study proposes a framework that utilizes large language model (LLM)-based "customer digital twins (CDT)" as virtual respondents. We identified active users within the Reddit community and aggregated their comprehensive review histories to construct individualized vector databases. By integrating retrieval-augmented generation (RAG) with prompt engineering, this study developed customer agents capable of dynamically retrieving and reasoning upon their specific past preferences and constraints. These customer agents, called CDTs, performed pairwise comparison tasks on product profiles generated via fractional factorial design, and the resulting choice data was analyzed to estimate part-worth utilities by logistic regression. Empirical validation demonstrates that these CDTs predict the preferences of actual users with 87.73% accuracy. Furthermore, a case study on the computer monitor category successfully quantified trade-offs between attributes such as panel type and resolution, deriving preference structures consistent with market realities. Ultimately, this study contributes to marketing research by presenting a scalable alternative that significantly improves both agility and cost-efficiency to traditional methods.