CODESTRUCT: Code Agents over Structured Action Spaces

arXiv cs.AI / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that LLM-based code agents often fail because they treat code repositories as unstructured text and rely on brittle string matching for edits.
  • It proposes CODESTRUCT, which reframes a repository as a structured action space by operating on named AST entities and using syntax-validated operations via readCode and editCode.
  • Across SWE-Bench Verified evaluated on six LLMs, CODESTRUCT increases Pass@1 accuracy by 1.2–5.0% while cutting token usage by 12–38% for most models.
  • The biggest gains occur for models prone to invalid or empty patches in text-based interfaces; for example, GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%.
  • Results on CodeAssistBench also show consistent accuracy improvements (+0.8–4.4%) with potential cost reductions up to 33%, supporting the idea that structure-aware interfaces improve reliability and efficiency.

Abstract

LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode for applying syntax-validated transformations to semantic program elements. Evaluated on SWE-Bench Verified across six LLMs, CODESTRUCT improves Pass@1 accuracy by 1.2-5.0% while reducing token consumption by 12-38% for most models. Models that frequently fail to produce valid patches under text-based interfaces benefit most: GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%. On CodeAssistBench, we observe consistent accuracy gains (+0.8-4.4%) with cost reductions up to 33%. Our results show that structure-aware interfaces offer a more reliable foundation for code agents.