SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

arXiv cs.RO / 3/23/2026

💬 OpinionModels & Research

共有:

Key Points

SG-CoT presents a two-stage framework that grounds LLM-based robotic planning in a structured scene graph to better handle ambiguity.
The framework builds a scene graph from observations, encoding objects, attributes, and inter-object relationships for grounding the LLM's reasoning.
It equips the LLM with retrieval functions to query relevant portions of the scene graph and to identify the source of ambiguity, enabling targeted disambiguation questions to users or other robots.
Experimental results show at least 10% improvement in question accuracy and up to 15% gains in multi-agent success, demonstrating improved reliability and generalizability.

Abstract

Ambiguity poses a major challenge to large language models (LLMs) used as robotic planners. In this letter, we present Scene Graph-Chain-of-Thought (SG-CoT), a two-stage framework where LLMs iteratively query a scene graph representation of the environment to detect and clarify ambiguities. First, a structured scene graph representation of the environment is constructed from input observations, capturing objects, their attributes, and relationships with other objects. Second, the LLM is equipped with retrieval functions to query portions of the scene graph that are relevant to the provided instruction. This grounds the reasoning process of the LLM in the observation, increasing the reliability of robotic planners under ambiguous situations. SG-CoT also allows the LLM to identify the source of ambiguity and pose a relevant disambiguation question to the user or another robot. Extensive experimentation demonstrates that SG-CoT consistently outperforms prior methods, with a minimum of 10% improvement in question accuracy and a minimum success rate increase of 4% in single-agent and 15% in multi-agent environments, validating its effectiveness for more generalizable robot planning.

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

Reddit r/LocalLLaMA

OpenSeeker's open-source approach aims to break up the data monopoly for AI search agents

THE DECODER

How to Choose the Best AI Chat Models of 2026 for Your Business Needs

Dev.to

I built an AI that generates lesson plans in your exact teaching voice (open source)

Dev.to

6-Band Prompt Decomposition: The Complete Technical Guide

Dev.to

SG-CoT: An Ambiguity-Aware Robotic Planning Framework using Scene Graph Representations

Key Points

Abstract

Related Articles

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

OpenSeeker's open-source approach aims to break up the data monopoly for AI search agents

How to Choose the Best AI Chat Models of 2026 for Your Business Needs

I built an AI that generates lesson plans in your exact teaching voice (open source)

6-Band Prompt Decomposition: The Complete Technical Guide

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer