Enhancing Clustering: An Explainable Approach via Filtered Patterns

arXiv cs.AI / 4/15/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper focuses on explainable (conceptual) clustering, where each cluster is described by a human-interpretable symbolic pattern such as a closed pattern or itemset.
  • It identifies a limitation in prior k-relaxed frequent patterns (k-RFPs) methods: different k-RFPs can produce the same k-cover, causing redundant representations and a larger, harder-to-search space.
  • The authors introduce a pattern reduction framework that formally characterizes when redundancy occurs, then removes redundant patterns by keeping one representative per distinct k-cover.
  • Using an ILP-based cluster selection pipeline, the work also evaluates how the reduced set of patterns affects interpretability/representativeness via robustness analysis against the induced clusters.
  • Experiments on multiple real-world datasets show reduced search space and improved computational efficiency, with preserved and sometimes better clustering quality.

Abstract

Machine learning has become a central research area, with increasing attention devoted to explainable clustering, also known as conceptual clustering, which is a knowledge-driven unsupervised learning paradigm that partitions data into \theta disjoint clusters, where each cluster is described by an explicit symbolic representation, typically expressed as a closed pattern or itemset. By providing human-interpretable cluster descriptions, explainable clustering plays an important role in explainable artificial intelligence and knowledge discovery. Recent work improved clustering quality by introducing k-relaxed frequent patterns (k-RFPs), a pattern model that relaxes strict coverage constraints through a generalized kcover definition. This framework integrates constraint-based reasoning, using SAT solvers for pattern generation, with combinatorial optimization, using Integer Linear Programming (ILP) for cluster selection. Despite its effectiveness, this approach suffers from a critical limitation: multiple distinct k-RFPs may induce identical k-covers, leading to redundant symbolic representations that unnecessarily enlarge the search space and increase computational complexity during cluster construction. In this paper, we address this redundancy through a pattern reduction framework. Our contributions are threefold. First, we formally characterize the conditions under which distinct k-RFPs induce identical kcovers, providing theoretical foundations for redundancy detection. Second, we propose an optimization strategy that removes redundant patterns by retaining a single representative pattern for each distinct k-cover. Third, we investigate the interpretability and representativeness of the patterns selected by the ILP model by analyzing their robustness with respect to their induced clusters. Extensive experiments conducted on several real-world datasets demonstrate that the proposed approach significantly reduces the pattern search space, improves computational efficiency, preserves and enhances in some cases the quality of the resulting clusters.