AI Navigate

When Slots Compete: Slot Merging in Object-Centric Learning

arXiv cs.CV / 3/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The authors introduce slot merging, a lightweight drop-in operation that merges overlapping slots during training to improve object factorization in slot-based object-centric learning.
  • It quantifies overlap with a Soft-IoU between slot-attention maps and uses a barycentric update to merge selected pairs, preserving gradient flow and requiring no extra learnable modules.
  • Merging follows a fixed policy with the decision threshold inferred from overlap statistics and is integrated into the DINOSAUR feature-reconstruction pipeline.
  • Empirically, this approach improves object factorization and mask quality and surpasses other adaptive methods in object discovery and segmentation benchmarks.

Abstract

Slot-based object-centric learning represents an image as a set of latent slots with a decoder that combines them into an image or features. The decoder specifies how slots are combined into an output, but the slot set is typically fixed: the number of slots is chosen upfront and slots are only refined. This can lead to multiple slots competing for overlapping regions of the same entity rather than focusing on distinct regions. We introduce slot merging: a drop-in, lightweight operation on the slot set that merges overlapping slots during training. We quantify overlap with a Soft-IoU score between slot-attention maps and combine selected pairs via a barycentric update that preserves gradient flow. Merging follows a fixed policy, with the decision threshold inferred from overlap statistics, requiring no additional learnable modules. Integrated into the established feature-reconstruction pipeline of DINOSAUR, the proposed method improves object factorization and mask quality, surpassing other adaptive methods in object discovery and segmentation benchmarks.