SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

arXiv cs.AI / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces SpatialGrammar, a domain-specific language designed to let LLM-based systems generate interactive 3D indoor scenes from natural language while reducing spatial errors and object collisions.
SpatialGrammar uses gravity-aligned layouts encoded as BEV grid placements with deterministic compilation into valid 3D geometry, enabling verifiable constraint checking.
The authors propose SG-Agent, a closed-loop framework that iteratively refines generated scenes using compiler feedback to enforce collision constraints and improve physical plausibility.
They also present SG-Mini, a 104M-parameter model trained solely on compiler-validated synthetic data, which performs competitively on single-shot indoor scene generation.
Experiments on 159 test scenes across five complexity scenarios show that SG-Agent improves spatial fidelity and physical plausibility over prior approaches, while SG-Mini matches larger LLM baselines in relevant settings.

Abstract

Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial errors and collisions, in part because common scene representations-raw coordinates or verbose code-are difficult for models to reason about 3D spatial relationships and physical constraints. We propose SpatialGrammar, a domain-specific language that represents gravity-aligned indoor layouts as BEV grid placements with deterministic compilation to valid 3D geometry, enabling verifiable constraint checking. Building on this representation, we develop (1) SG-Agent, a closed-loop system that uses compiler feedback to iteratively refine scenes and enforce collision constraints, and (2) SG-Mini, a 104M-parameter model trained entirely on compiler-validated synthetic data. Across 159 test scenes spanning five scenarios of different complexity, SG-Agent improves spatial fidelity and physical plausibility over prior methods, while SG-Mini performs competitively against larger LLM-based baselines on single-shot generation scenarios.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Reddit r/artificial

Automating FDA Compliance: AI for Specialty Food Producers

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

I hate this group but not literally

Reddit r/LocalLLaMA

SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Automating FDA Compliance: AI for Specialty Food Producers

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

I hate this group but not literally

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer