GSI Agent: Domain Knowledge Enhancement for Large Language Models in Green Stormwater Infrastructure

arXiv cs.AI / 3/18/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces GSI Agent, a domain-enhanced LLM framework designed to improve performance on Green Stormwater Infrastructure tasks by combining supervised fine-tuning on a curated GSI instruction dataset, retrieval-augmented generation over an internal GSI knowledge base built from municipal documents, and an agent-based reasoning pipeline that coordinates retrieval, context integration, and structured response generation.
It also constructs a new GSI dataset aligned with real-world inspection and maintenance scenarios, and reports that BLEU-4 on the GSI dataset improves from 0.090 to 0.307 while performance on a general knowledge dataset remains stable (0.304 vs. 0.305).
The approach integrates three strategies—SFT, RAG over municipal documents, and an agent-based reasoning workflow—to adapt general-purpose LLMs for professional infrastructure tasks and reduce domain-specific hallucinations.
The work demonstrates how systematic domain knowledge enhancement can enable LLMs to perform more reliably in engineering contexts, suggesting broader applicability to similar domain-specific infrastructure applications.

Abstract

Green Stormwater Infrastructure (GSI) systems, such as permeable pavement, rain gardens, and bioretention facilities, require continuous inspection and maintenance to ensure long-term performance. However, domain knowledge about GSI is often scattered across municipal manuals, regulatory documents, and inspection forms. As a result, non-expert users and maintenance staff may struggle to obtain reliable and actionable guidance from field observations. Although Large Language Models (LLMs) have demonstrated strong general reasoning and language generation capabilities, they often lack domain-specific knowledge and may produce inaccurate or hallucinated answers in engineering scenarios. This limitation restricts their direct application to professional infrastructure tasks. In this paper, we propose GSI Agent, a domain-enhanced LLM framework designed to improve performance in GSI-related tasks. Our approach integrates three complementary strategies: (1) supervised fine-tuning (SFT) on a curated GSI instruction dataset, (2) retrieval-augmented generation (RAG) over an internal GSI knowledge base constructed from municipal documents, and (3) an agent-based reasoning pipeline that coordinates retrieval, context integration, and structured response generation. We also construct a new GSI Dataset aligned with real-world GSI inspection and maintenance scenarios. Experimental results show that our framework significantly improves domain-specific performance while maintaining general knowledge capability. On the GSI dataset, BLEU-4 improves from 0.090 to 0.307, while performance on the common knowledge dataset remains stable (0.304 vs. 0.305). These results demonstrate that systematic domain knowledge enhancement can effectively adapt general-purpose LLMs to professional infrastructure applications.