Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport

arXiv cs.AI / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many topic modeling approaches optimize statistical coherence but can yield redundant or irrelevant topics that do not reflect user intent.
  • It introduces Human-centric Topic Modeling (Human-TM), a task formulation that injects a human-provided goal directly into the topic modeling process to produce interpretable and diverse, goal-aligned topics.
  • The proposed method, GCTM-OT, uses LLM-based prompting to extract candidate goals from documents and then applies semantic-aware contrastive learning with optimal transport to discover topics.
  • Experiments on three public subreddit datasets show improved topic coherence and diversity versus state-of-the-art baselines, along with significantly better alignment to human goals.

Abstract

Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Modeling, \emph{Human-TM}), a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpretable, diverse and goal-oriented topics. To tackle this challenge, we propose the \textbf{G}oal-prompted \textbf{C}ontrastive \textbf{T}opic \textbf{M}odel with \textbf{O}ptimal \textbf{T}ransport (GCTM-OT), which first uses LLM-based prompting to extract goal candidates from documents, then incorporates these into semantic-aware contrastive learning via optimal transport for topic discovery. Experimental results on three public subreddit datasets show that GCTM-OT outperforms state-of-the-art baselines in topic coherence and diversity while significantly improving alignment with human-provided goals, paving the way for more human-centric topic discovery systems.