Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans
arXiv cs.CV / 3/13/2026
📰 NewsModels & Research
Key Points
- HouseMind is a multimodal large language model that unifies floor plan understanding, generation, and editing in one framework, addressing joint reasoning over geometry, semantics, and spatial hierarchy.
- It introduces discrete room-instance tokens to construct a unified vocabulary bridging layouts and symbolic reasoning.
- With multimodal alignment and instruction tuning, the model can synthesize coherent, controllable layouts from text instructions.
- Experiments show improved geometric validity and controllability while remaining efficient and locally deployable.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA