DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

arXiv cs.AI / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper introduces DomAgent, an autonomous coding agent designed to improve domain-specific code generation by addressing gaps in generic LLM training data for specialized real-world tasks.
  • Its key module, DomRetriever, combines top-down knowledge-graph reasoning with bottom-up case-based reasoning to iteratively retrieve structured domain knowledge and relevant examples.
  • DomRetriever supports flexible deployment by functioning either as part of DomAgent or independently with other LLMs for domain adaptation.
  • Experiments on the DS-1000 data-science benchmark and on real-world truck software development tasks show DomAgent significantly boosts code-generation success for domain-specific requirements.
  • The results suggest that small open-source models using this approach can narrow a large portion of the performance gap versus large proprietary LLMs in complex application settings.

Abstract

Large language models (LLMs) have shown impressive capabilities in code generation. However, because most LLMs are trained on public domain corpora, directly applying them to real-world software development often yields low success rates, as these scenarios frequently require domain-specific knowledge. In particular, domain-specific tasks usually demand highly specialized solutions, which are often underrepresented or entirely absent in the training data of generic LLMs. To address this challenge, we propose DomAgent, an autonomous coding agent that bridges this gap by enabling LLMs to generate domain-adapted code through structured reasoning and targeted retrieval. A core component of DomAgent is DomRetriever, a novel retrieval module that emulates how humans learn domain-specific knowledge, by combining conceptual understanding with experiential examples. It dynamically integrates top-down knowledge-graph reasoning with bottom-up case-based reasoning, enabling iterative retrieval and synthesis of structured knowledge and representative cases to ensure contextual relevance and broad task coverage. DomRetriever can operate as part of DomAgent or independently with any LLM for flexible domain adaptation. We evaluate DomAgent on an open benchmark dataset in the data science domain (DS-1000) and further apply it to real-world truck software development tasks. Experimental results show that DomAgent significantly enhances domain-specific code generation, enabling small open-source models to close much of the performance gap with large proprietary LLMs in complex, real-world applications. The code is available at: https://github.com/Wangshuaiia/DomAgent.