Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

arXiv cs.CL / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a framework that refines clusters produced by any unsupervised clustering algorithm by using LLMs as semantic judges rather than as embedding generators.
  • It applies three LLM reasoning stages—coherence verification, redundancy adjudication (merge/reject overlapping clusters), and fully unsupervised label grounding—to improve cluster quality without labeled data.
  • Experiments on social media corpora from two different platforms show improved cluster coherence and more human-aligned labeling quality compared with classical topic models and newer representation-based baselines.
  • Human evaluations find strong agreement with the LLM-generated labels even though no gold-standard annotations are provided, and robustness tests suggest cross-platform stability under matched temporal/volume conditions.
  • The authors argue that LLM reasoning can act as a general validation/refinement mechanism to make unsupervised text analytics more reliable and interpretable.

Abstract

Unsupervised methods are widely used to induce latent semantic structure from large text collections, yet their outputs often contain incoherent, redundant, or poorly grounded clusters that are difficult to validate without labeled data. We propose a reasoning-based refinement framework that leverages large language models (LLMs) not as embedding generators, but as semantic judges that validate and restructure the outputs of arbitrary unsupervised clustering algorithms.Our framework introduces three reasoning stages: (i) coherence verification, where LLMs assess whether cluster summaries are supported by their member texts; (ii) redundancy adjudication, where candidate clusters are merged or rejected based on semantic overlap; and (iii) label grounding, where clusters are assigned interpretable labels in a fully unsupervised manner. This design decouples representation learning from structural validation and mitigates common failure modes of embedding-only approaches. We evaluate the framework on real-world social media corpora from two platforms with distinct interaction models, demonstrating consistent improvements in cluster coherence and human-aligned labeling quality over classical topic models and recent representation-based baselines. Human evaluation shows strong agreement with LLM-generated labels, despite the absence of gold-standard annotations. We further conduct robustness analyses under matched temporal and volume conditions to assess cross-platform stability. Beyond empirical gains, our results suggest that LLM-based reasoning can serve as a general mechanism for validating and refining unsupervised semantic structure, enabling more reliable and interpretable analyses of large text collections without supervision.