Deep sprite-based image models: An analysis

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes sprite-based image decomposition models as an interpretable approach for finding recurrent patterns in image collections, beyond the progress driven by foundation models in segmentation and diffusion.
  • It explains that current sprite-based model variants require dataset-specific tailoring and can have scaling difficulties when images contain many objects.
  • The authors perform an extensive study on clustering benchmarks to identify the models’ core components and design choices.
  • Based on this analysis, they propose a deep sprite-based image decomposition method that matches state-of-the-art unsupervised class-aware image segmentation on CLEVR.
  • The proposed method scales linearly with the number of objects and explicitly identifies object categories while modeling images in an interpretable way.

Abstract

While foundation models drive steady progress in image segmentation and diffusion algorithms compose always more realistic images, the seemingly simple problem of identifying recurrent patterns in a collection of images remains very much open. In this paper, we focus on sprite-based image decomposition models, which have shown some promise for clustering and image decomposition and are appealing because of their high interpretability. These models come in different flavors, need to be tailored to specific datasets, and struggle to scale to images with many objects. We dive into the details of their design, identify their core components, and perform an extensive analysis on clustering benchmarks. We leverage this analysis to propose a deep sprite-based image decomposition method that performs on par with state-of-the-art unsupervised class-aware image segmentation methods on the standard CLEVR benchmark, scales linearly with the number of objects, identifies explicitly object categories, and fully models images in an easily interpretable way.

Deep sprite-based image models: An analysis | AI Navigate