CODESTRUCT: Code Agents over Structured Action Spaces

arXiv cs.AI / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLM-based code agents often fail because they treat code repositories as unstructured text and rely on brittle string matching for edits.
It proposes CODESTRUCT, which reframes a repository as a structured action space by operating on named AST entities and using syntax-validated operations via readCode and editCode.
Across SWE-Bench Verified evaluated on six LLMs, CODESTRUCT increases Pass@1 accuracy by 1.2–5.0% while cutting token usage by 12–38% for most models.
The biggest gains occur for models prone to invalid or empty patches in text-based interfaces; for example, GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%.
Results on CodeAssistBench also show consistent accuracy improvements (+0.8–4.4%) with potential cost reductions up to 33%, supporting the idea that structure-aware interfaces improve reliability and efficiency.

Abstract

LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode for applying syntax-validated transformations to semantic program elements. Evaluated on SWE-Bench Verified across six LLMs, CODESTRUCT improves Pass@1 accuracy by 1.2-5.0% while reducing token consumption by 12-38% for most models. Models that frequently fail to produce valid patches under text-based interfaces benefit most: GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%. On CodeAssistBench, we observe consistent accuracy gains (+0.8-4.4%) with cost reductions up to 33%. Our results show that structure-aware interfaces offer a more reliable foundation for code agents.

Black Hat Asia

AI Business

Meta's latest model is as open as Zuckerberg's private school

The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds

SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)

Dev.to

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial

CODESTRUCT: Code Agents over Structured Action Spaces

Key Points

Abstract

Related Articles

Black Hat Asia

Meta's latest model is as open as Zuckerberg's private school

AI fuels global trade growth as China-US flows shift, McKinsey finds

Why multi-agent AI security is broken (and the identity patterns that actually work)

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer