Learning Discrete Abstractions for Visual Rearrangement Tasks Using Vision-Guided Graph Coloring

arXiv cs.RO / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a method to learn discrete, graph-structured abstractions from visual data to support high-level planning in rearrangement tasks.
  • It exploits the bipartite structure of rearrangement problems and fuses structural constraints with an attention-guided visual distance to induce abstractions.
  • The approach enables autonomous discovery of abstractions from vision alone and demonstrates improved planning performance over existing methods in simulation on two tasks.
  • The work aims to automate abstraction discovery to improve scalability of planning in robotics, reducing reliance on hand-engineered representations.

Abstract

Learning abstractions directly from data is a core challenge in robotics. Humans naturally operate at an abstract level, reasoning over high-level subgoals while delegating execution to low-level motor skills -- an ability that enables efficient problem solving in complex environments. In robotics, abstractions and hierarchical reasoning have long been central to planning, yet they are typically hand-engineered, demanding significant human effort and limiting scalability. Automating the discovery of useful abstractions directly from visual data would make planning frameworks more scalable and more applicable to real-world robotic domains. In this work, we focus on rearrangement tasks where the state is represented with raw images, and propose a method to induce discrete, graph-structured abstractions by combining structural constraints with an attention-guided visual distance. Our approach leverages the inherent bipartite structure of rearrangement problems, integrating structural constraints and visual embeddings into a unified framework. This enables the autonomous discovery of abstractions from vision alone, which can subsequently support high-level planning. We evaluate our method on two rearrangement tasks in simulation and show that it consistently identifies meaningful abstractions that facilitate effective planning and outperform existing approaches.