BIM Information Extraction Through LLM-based Adaptive Exploration

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study argues that extracting specific information from BIM models is difficult because existing natural-language-to-structured-query methods rely on a static, assumed data organization that breaks under BIM heterogeneity.
  • It proposes “adaptive exploration,” an LLM-based agent that iteratively runs code to discover the BIM model’s structure at runtime rather than assuming a fixed schema.
  • The approach is evaluated on ifc-bench v2, a newly introduced open-source BIM question-answering benchmark with 1,027 tasks spanning 37 IFC models from 21 projects.
  • Factorial ablation experiments across two LLM capability levels and four augmentation strategies show adaptive exploration consistently outperforms static query generation under all tested configurations.
  • The results suggest that the core challenge of BIM heterogeneity is best addressed at the paradigm level (interactive exploration) instead of further optimizing static methods.

Abstract

BIM models provide structured representations of building geometry, semantics, and topology, yet extracting specific information from them remains remarkably difficult. Current approaches translate natural language into structured queries by assuming a fixed data organization (static approach), which BIM heterogeneity eventually invalidates. We address this with a new paradigm, adaptive exploration, where an LLM-based agent iteratively executes code to extract information from a BIM model, discovering its structure at runtime instead of assuming it. We evaluate this approach on ifc-bench v2, an open-source BIM question-answering benchmark introduced alongside this work, comprising 1,027 tasks across 37 IFC models from 21 projects. A factorial ablation across two LLM capability levels and four augmentation strategies shows that adaptive exploration significantly outperforms static query generation across all configurations, regardless of the augmentation strategy. These results indicate that BIM heterogeneity is best addressed at the paradigm level, not by further optimizing static approaches.