FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow
arXiv cs.CV / 3/23/2026
📰 NewsModels & Research
Key Points
- FlowScene proposes a tri-branch generative model conditioned on multimodal graphs that jointly generates scene layouts, object shapes, and textures.
- It introduces a rectified flow mechanism that exchanges object information during generation to enable collaborative reasoning across the object graph.
- The approach enforces scene-level style coherence across structure and appearance, enabling fine-grained control over objects' geometry, textures, and relations.
- Experimental results show FlowScene outperforms language-conditioned and graph-conditioned baselines in realism, style consistency, and alignment with human preferences.
- By addressing limitations of prior methods, FlowScene aims to deliver high-fidelity, texture-rich indoor scenes suitable for industrial applications.
Related Articles
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to
[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data
Reddit r/MachineLearning
[R] Looking for arXiv endorser (cs.AI or cs.LG)
Reddit r/MachineLearning

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!
Reddit r/artificial