Square Superpixel Generation and Representation Learning via Granular Ball Computing

arXiv cs.CV / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new superpixel generation method that approximates superpixels with multi-scale square blocks to avoid the irregular-region problem of existing approaches.
  • It introduces a purity-score selection strategy based on pixel-intensity similarity to retain high-quality square blocks and improve representation quality.
  • By using regular, square-shaped regions, the method is designed to support efficient parallel processing and better integration with deep learning instead of relying on offline preprocessing.
  • The resulting square superpixels can be used either as graph nodes for GNNs or as tokens for Vision Transformers to enable structured, multi-scale feature aggregation.
  • Experiments on downstream vision tasks report consistent performance improvements, suggesting the approach enhances end-to-end trainable visual representation learning.

Abstract

Superpixels provide a compact region-based representation that preserves object boundaries and local structures, and have therefore been widely used in a variety of vision tasks to reduce computational cost. However, most existing superpixel algorithms produce irregularly shaped regions, which are not well aligned with regular operators such as convolutions. Consequently, superpixels are often treated as an offline preprocessing step, limiting parallel implementation and hindering end-to-end optimization within deep learning pipelines. Motivated by the adaptive representation and coverage property of granular-ball computing, we develop a square superpixel generation approach. Specifically, we approximate superpixels using multi-scale square blocks to avoid the computational and implementation difficulties induced by irregular shapes, enabling efficient parallel processing and learnable feature extraction. For each block, a purity score is computed based on pixel-intensity similarity, and high-quality blocks are selected accordingly. The resulting square superpixels can be readily integrated as graph nodes in graph neural networks (GNNs) or as tokens in Vision Transformers (ViTs), facilitating multi-scale information aggregation and structured visual representation. Experimental results on downstream tasks demonstrate consistent performance improvements, validating the effectiveness of the proposed method.