Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning
arXiv cs.CV / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- Video Object-Centric Learning methods using slot-attention often over-fragment objects because the reconstruction objective implicitly encourages occupying all slots redundantly.
- The paper introduces a reconstruction-guided slot curriculum (SlotCurri) that starts with few coarse slots and progressively adds slots only where reconstruction error stays high, reducing fragmentation early in training.
- Since meaningful sub-parts emerge only when coarse semantics are well separated, SlotCurri adds a structure-aware loss (in addition to MSE) to preserve local contrast and edge information for sharper semantic boundaries.
- It further proposes cyclic inference that propagates slots forward and then backward through frames to improve temporal consistency even for early frames.
- Experiments report notable foreground ARI improvements of +6.8 on YouTube-VIS and +8.3 on MOVi-C, and the authors provide code publicly.
Related Articles

Black Hat Asia
AI Business

"The Agent Didn't Decide Wrong. The Instructions Were Conflicting — and Nobody Noticed."
Dev.to
Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack
Dev.to

Stop Counting Prompts — Start Reflecting on AI Fluency
Dev.to

Reliable Function Calling in Deeply Recursive Union Types: Fixing Qwen Models' Double-Stringify Bug
Dev.to