MOSAIC: Modular Opinion Summarization using Aspect Identification and Clustering

arXiv cs.LG / 3/23/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • MOSAIC is a scalable, modular framework for opinion summarization that decomposes the task into theme discovery, structured opinion extraction, and grounded summary generation to improve interpretability and industrial deployment.
  • The approach is validated with online A/B tests on live product pages, showing that surfacing intermediate outputs can improve customer experience and deliver measurable value before full deployment.
  • Offline experiments demonstrate that MOSAIC achieves superior aspect coverage and faithfulness compared with strong baselines for summarization.
  • The work introduces opinion clustering as a system-level component and shows its significant impact on faithfulness under noisy and redundant user reviews.
  • The authors identify reliability limitations in the SPACE dataset and release a new open-source tour experience dataset (TRECS) to enable more robust evaluation.

Abstract

Reviews are central to how travelers evaluate products on online marketplaces, yet existing summarization research often emphasizes end-to-end quality while overlooking benchmark reliability and the practical utility of granular insights. To address this, we propose MOSAIC, a scalable, modular framework designed for industrial deployment that decomposes summarization into interpretable components, including theme discovery, structured opinion extraction, and grounded summary generation. We validate the practical impact of our approach through online A/B tests on live product pages, showing that surfacing intermediate outputs improves customer experience and delivers measurable value even prior to full summarization deployment. We further conduct extensive offline experiments to demonstrate that MOSAIC achieves superior aspect coverage and faithfulness compared to strong baselines for summarization. Crucially, we introduce opinion clustering as a system-level component and show that it significantly enhances faithfulness, particularly under the noisy and redundant conditions typical of user reviews. Finally, we identify reliability limitations in the standard SPACE dataset and release a new open-source tour experience dataset (TRECS) to enable more robust evaluation.